## Step 1: Importing Pandas

In [1]:
import pandas as pd

Always import pandas as pd — this is the standard in the Python community.

## Step 2: Reading CSV Files

**Syntax**

In [30]:
df = pd.read_csv('Dataset/data.csv')
print(df)

     Name  Age  Gender  Score
0   Alice   23  Female   88.0
1     Bob   25    Male   76.0
2   Clara   22  Female    NaN
3  farzad   22    Male   79.0
4    Sara   24  Female   65.0
5   Sahar   26  Female   83.0


## Step 3: Exploring the Data

In [18]:
print(df.head())      # Show first 5 rows

     Name  Age  Gender  Score
0   Alice   23  Female   88.0
1     Bob   25    Male   76.0
2   Clara   22  Female    NaN
3  farzad   22    Male   79.0
4    Sara   24  Female   65.0


In [19]:
print(df.tail())      # Show last 5 rows

     Name  Age  Gender  Score
1     Bob   25    Male   76.0
2   Clara   22  Female    NaN
3  farzad   22    Male   79.0
4    Sara   24  Female   65.0
5   Sahar   26  Female   83.0


In [20]:
print(df.shape)       # (rows, columns)

(6, 4)


In [21]:
print(df.columns)     # Column names

Index(['Name', 'Age', 'Gender', 'Score'], dtype='object')


In [22]:
print(df.info())      # Summary of data types

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Name    6 non-null      object 
 1   Age     6 non-null      int64  
 2   Gender  6 non-null      object 
 3   Score   5 non-null      float64
dtypes: float64(1), int64(1), object(2)
memory usage: 324.0+ bytes
None


In [11]:
print(df.describe())  # Statistical summary

             Age      Score
count   6.000000   6.000000
mean   23.666667  80.666667
std     1.632993   9.811558
min    22.000000  65.000000
25%    22.250000  76.750000
50%    23.500000  81.000000
75%    24.750000  86.750000
max    26.000000  93.000000


**Practical Note:**

Use .head() and .info() frequently to explore datasets in machine learning for detecting issues like:
* Missing values
* Wrong data types
* Unexpected column names

## Step 4: Cleaning/Preprocessing While Reading

**1. Skip rows**

In [23]:
df = pd.read_csv('Dataset/data.csv', skiprows=1)
print(df)

    Alice  23  Female    88
0     Bob  25    Male  76.0
1   Clara  22  Female   NaN
2  farzad  22    Male  79.0
3    Sara  24  Female  65.0
4   Sahar  26  Female  83.0


**2. Set a column as index**

In [24]:
df = pd.read_csv('Dataset/data.csv', index_col='Name')
print(df)

        Age  Gender  Score
Name                      
Alice    23  Female   88.0
Bob      25    Male   76.0
Clara    22  Female    NaN
farzad   22    Male   79.0
Sara     24  Female   65.0
Sahar    26  Female   83.0


## 3. Handle missing values

In [25]:
df = pd.read_csv('Dataset/data.csv', na_values=[''])
print(df.isnull())

    Name    Age  Gender  Score
0  False  False   False  False
1  False  False   False  False
2  False  False   False   True
3  False  False   False  False
4  False  False   False  False
5  False  False   False  False


## Step 5: Writing to a CSV File

In [33]:
df = pd.DataFrame({
    'City': ['Tehran', 'Shiraz', 'Tabriz'],
    'Population': [8500000, 1800000, 1550000]
})

df.to_csv('Dataset/cities.csv', index=False)

**Practical Notes:**

* Always use index=False to prevent pandas from writing row numbers.
* Use encoding='utf-8-sig' when writing Persian/Unicode characters.

## Step 6: Using CSV in Machine Learning Projects

**Typical ML Workflow:**

In [None]:
# Step 1: Load the dataset
df = pd.read_csv('ml_data.csv')

# Step 2: Check for missing values
print(df.isnull().sum())

# Step 3: Clean data (drop rows with missing values)
df = df.dropna()

# Step 4: Feature and target split
X = df[['feature1', 'feature2', 'feature3']]  # input features
y = df['label']  # output label

# Step 5: Train-test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)