# Pandas Basics Tutorial



## Learning Objectives
By the end of this lesson you will be able to:
1. Create and inspect **Series** and **DataFrame** objects.
2. Load data from common file formats with `pd.read_*` helpers.
3. Select and filter data using `.loc` and `.iloc`.
4. Perform basic transforms, aggregations, and visualizations.
5. Handle missing values effectively.


In [None]:
!pip install matplotlib
!pip install numpy
!pip install pandas

# it works in your local environment
h



In [None]:
import pandas as pd
import numpy as np
pd.__version__

## 1. Series and DataFrame objects

In [None]:
s = pd.Series([1, 3, 5, np.nan, 6, 8], name='MySeries')
s

In [None]:
data = {
    'A': 1.,
    'B': pd.Timestamp('2025-01-01'),
    'C': pd.Series(1, index=list(range(4)), dtype='float32'),
    'D': np.array([3] * 4, dtype='int32'),
    'E': pd.Categorical(['test', 'train', 'test', 'train']),
    'F': 'foo'
}
df = pd.DataFrame(data)
df

## 2. Loading data from files

In [None]:
from io import StringIO
csv_data = '''name,age,city
Alice,25,New York
Bob,30,Paris
Charlie,35,London'''

people = pd.read_csv(StringIO(csv_data))
people

## 3. Inspecting data

In [None]:
people.head()

In [None]:
people.info()

In [None]:
people.describe()

## 4. Selecting data – `.loc` and `.iloc`

In [None]:
people.loc[people['age'] > 28, ['name', 'city']]

In [None]:
people.iloc[0:2, 0:2]


## 5. Basic operations

In [None]:
people['age_plus_10'] = people['age'] + 10
people

In [None]:
people.sort_values('age', ascending=False)

## 6. Handling missing data

In [None]:
df2 = df.copy()
df2.loc[1, 'A'] = np.nan
df2.loc[2, 'D'] = np.nan
df2

In [None]:
df2.dropna()

In [None]:
df2.fillna({'A': df2['A'].mean(), 'D': 0})

## 8. Hands‑on Exercises

Add new cells below and solve the following tasks:
1. **Load a CSV file**: Use `pd.read_csv` to load the Iris dataset (you can fetch it from the UCI repository) and display its first five rows.
2. **Compute statistics**: For the Iris dataset, compute the mean *petal length* for each species.
3. **Handle missing data**: Make a copy of the Iris DataFrame, randomly set 10 values to `NaN`, then fill them using the column medians.
4. **Visualization**: Plot a scatter plot of *sepal length* vs. *sepal width* colored by species.
