[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DrFranData/PfDA/blob/main/Topic2.ipynb)
 

# 🚗 Topic 2: Loading and Inspecting Data with Pandas

In this lesson, we’ll load a real-world dataset using **pandas**, and explore it to understand its shape, structure, and contents.

**Dataset:** Auto MPG — information about fuel efficiency and car features (from 1970s/80s)

## 🧠 Learning Objectives
- Load a dataset from a URL
- View the top and bottom of the dataset
- Get summary information
- Understand rows, columns, and data types
- Perform basic exploration

## 📥 Step 1: Load the Dataset

We’ll use `pandas` to read the dataset from a URL.

In [None]:
import pandas as pd

url = "https://raw.githubusercontent.com/plotly/datasets/master/auto-mpg.csv"
df = pd.read_csv(url)
df.head()  # Show the first 5 rows

## 🔍 Step 2: Look at the Whole Dataset

Let’s explore the dataset using some pandas commands.

In [None]:
# See how many rows and columns
df.shape

In [None]:
# See column names
df.columns

In [None]:
# Show the last 5 rows
df.tail()

In [None]:
# Get a random sample of 5 rows
df.sample(5)

## ℹ️ Step 3: Understand the Columns

Each column in the dataset represents a type of information about each car.

In [None]:
# Get summary of dataset
df.info()

In [None]:
# Get descriptive statistics for numeric columns
df.describe()

**Column meanings:**
- `mpg`: Miles per gallon (fuel efficiency)
- `cylinders`: Engine size
- `displacement`: Engine volume
- `horsepower`: Power output
- `weight`: Vehicle weight
- `acceleration`: Time to accelerate
- `model_year`: Year of the car (last two digits)
- `origin`: Region (1 = USA, 2 = Europe, 3 = Asia)
- `name`: Name of the car

## ✍️ Exercise 1: Explore the Dataset

Try answering the questions below by writing and running code in the next cells.

1. How many rows are in the dataset?
2. What are the names of the columns?
3. What are the top 3 most common car names?
4. How many unique values are there in the `origin` column?
5. What is the average mpg of all cars?

In [None]:
# Your answers here
# 1. Total rows
df.shape[0]

In [None]:
# 2. Column names
df.columns.tolist()

In [None]:
# 3. Top 3 car names
df['name'].value_counts().head(3)

In [None]:
# 4. Unique values in origin
df['origin'].nunique()

In [None]:
# 5. Average MPG
df['mpg'].mean()

## 🧾 Bonus: Slicing and Selecting Data

Now that you’ve seen the data, let’s learn how to select specific **rows**, **columns**, and **combinations**.

We’ll use the `.loc[]` and `.iloc[]` accessors in pandas.

In [None]:
# Select a single column (as a Series)
df['mpg'].head()

In [None]:
# Select multiple columns
df[['mpg', 'horsepower', 'weight']].head()

In [None]:
# Select rows 0 to 4 (inclusive of 0, exclusive of 5)
df.iloc[0:5]

In [None]:
# Select rows 0 to 4 and only the mpg and name columns
df.loc[0:4, ['mpg', 'name']]

## ✍️ Exercise 2: Try Some Slicing

Use the cells below to practice:

1. Show the `horsepower` of the first 10 cars.
2. Show `name`, `mpg`, and `weight` for rows 10 to 15.
3. Show all cars with an MPG greater than 35.

In [None]:
# 1. Horsepower of first 10 cars
df['horsepower'].iloc[:10]

In [None]:
# 2. Selected columns and rows 10–15
df.loc[10:15, ['name', 'mpg', 'weight']]

In [None]:
# 3. Cars with mpg > 35
df[df['mpg'] > 35]

You now know how to access and explore different parts of your dataset — a key skill in any analysis.

Coming up next: **cleaning and transforming data** to fix missing values and prepare it for deeper insights.

## ✅ Summary

- We used `pandas` to load a dataset from a URL
- Explored the size, shape, and contents of the data
- Viewed summary statistics and sample records
- Practiced answering simple analytical questions

In the next topic, we will **clean and prepare the data** so it's ready for deeper analysis!