## PART 1: Pandas Installation and Import

If you haven’t installed pandas, run this first:

In [None]:
pip install pandas

Then import it in Python:

In [6]:
import pandas as pd

## PART 2: What is a Series?

A Series is like a single column in Excel.
It is a 1D labeled array capable of holding any data type.

**Example:**

In [7]:
import pandas as pd

data = [10, 20, 30, 40]
s = pd.Series(data)
print(s)

0    10
1    20
2    30
3    40
dtype: int64


**Practical Notes:**

* The left side (0,1,2,3) is the index.
* The right side is the data.
* A Series can have custom labels.

In [8]:
s = pd.Series(data, index=["a", "b", "c", "d"])
print(s)

a    10
b    20
c    30
d    40
dtype: int64


**Common Series Use in ML:**

In [11]:
# Labels in supervised learning
labels = pd.Series(["spam", "ham", "spam", "ham"])
print(labels)

0    spam
1     ham
2    spam
3     ham
dtype: object


## PART 3: What is a DataFrame?

A DataFrame is like an Excel sheet. It’s a 2D labeled structure with rows and columns.

**Example:**

In [12]:
data = {
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "score": [85.5, 92.3, 88.0]
}

df = pd.DataFrame(data)
print(df)

      name  age  score
0    Alice   25   85.5
1      Bob   30   92.3
2  Charlie   35   88.0


**Practical Notes:**

* A column in a DataFrame is a Series.
* Good for tabular data like in datasets used in machine learning (CSV, Excel, etc.).

## PART 4: Creating DataFrame in Different Ways

From a list of dictionaries:

In [14]:
data = [
    {"name": "Alice", "age": 25},
    {"name": "Bob", "age": 30},
]
df = pd.DataFrame(data)
print(df)

    name  age
0  Alice   25
1    Bob   30


From a CSV (common in ML datasets):

In [22]:
df = pd.read_csv("Dataset/titanic.csv")

## PART 5: Exploring DataFrame

In [23]:
df.head()         # First 5 rows

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [24]:
df.tail(3)        # Last 3 rows

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


In [25]:
df.shape          # (rows, columns)

(891, 12)

In [26]:
df.columns        # List of column names

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [27]:
df.dtypes         # Data types

PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object

In [28]:
df.info()         # Summary

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


## PART 6: Accessing Data

**Accessing a Column:**

In [31]:
df["Name"]         # Returns a Series

0                                Braund, Mr. Owen Harris
1      Cumings, Mrs. John Bradley (Florence Briggs Th...
2                                 Heikkinen, Miss. Laina
3           Futrelle, Mrs. Jacques Heath (Lily May Peel)
4                               Allen, Mr. William Henry
                             ...                        
886                                Montvila, Rev. Juozas
887                         Graham, Miss. Margaret Edith
888             Johnston, Miss. Catherine Helen "Carrie"
889                                Behr, Mr. Karl Howell
890                                  Dooley, Mr. Patrick
Name: Name, Length: 891, dtype: object

In [32]:
df[["Name", "Age"]]  # Returns a new DataFrame

Unnamed: 0,Name,Age
0,"Braund, Mr. Owen Harris",22.0
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0
2,"Heikkinen, Miss. Laina",26.0
3,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0
4,"Allen, Mr. William Henry",35.0
...,...,...
886,"Montvila, Rev. Juozas",27.0
887,"Graham, Miss. Margaret Edith",19.0
888,"Johnston, Miss. Catherine Helen ""Carrie""",
889,"Behr, Mr. Karl Howell",26.0


**Accessing a Row:**

In [33]:
df.loc[0]     # By label/index

PassengerId                          1
Survived                             0
Pclass                               3
Name           Braund, Mr. Owen Harris
Sex                               male
Age                               22.0
SibSp                                1
Parch                                0
Ticket                       A/5 21171
Fare                              7.25
Cabin                              NaN
Embarked                             S
Name: 0, dtype: object

In [34]:
df.iloc[1]    # By integer position

PassengerId                                                    2
Survived                                                       1
Pclass                                                         1
Name           Cumings, Mrs. John Bradley (Florence Briggs Th...
Sex                                                       female
Age                                                         38.0
SibSp                                                          1
Parch                                                          0
Ticket                                                  PC 17599
Fare                                                     71.2833
Cabin                                                        C85
Embarked                                                       C
Name: 1, dtype: object

## PART 7: Adding, Modifying, and Removing Data

**Add a new column:**

In [37]:
df["new_col"] = df["Age"] * 2
df.head()         # First 5 rows

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,new_col
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,44.0
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,76.0
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,52.0
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,70.0
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,70.0


**Modify values:**

In [38]:
df["Age"] = df["Age"] + 1
df.head()         # First 5 rows

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,new_col
0,1,0,3,"Braund, Mr. Owen Harris",male,23.0,1,0,A/5 21171,7.25,,S,44.0
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,39.0,1,0,PC 17599,71.2833,C85,C,76.0
2,3,1,3,"Heikkinen, Miss. Laina",female,27.0,0,0,STON/O2. 3101282,7.925,,S,52.0
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,36.0,1,0,113803,53.1,C123,S,70.0
4,5,0,3,"Allen, Mr. William Henry",male,36.0,0,0,373450,8.05,,S,70.0


**Remove a column:**

In [39]:
df.drop("Cabin", axis=1, inplace=True)
df.head()         # First 5 rows

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Embarked,new_col
0,1,0,3,"Braund, Mr. Owen Harris",male,23.0,1,0,A/5 21171,7.25,S,44.0
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,39.0,1,0,PC 17599,71.2833,C,76.0
2,3,1,3,"Heikkinen, Miss. Laina",female,27.0,0,0,STON/O2. 3101282,7.925,S,52.0
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,36.0,1,0,113803,53.1,S,70.0
4,5,0,3,"Allen, Mr. William Henry",male,36.0,0,0,373450,8.05,S,70.0


## PART 8: Useful Functions in Machine Learning

**Describe (get stats):**

In [40]:
df.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare,new_col
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0,714.0
mean,446.0,0.383838,2.308642,30.699118,0.523008,0.381594,32.204208,59.398235
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429,29.052995
min,1.0,0.0,1.0,1.42,0.0,0.0,0.0,0.84
25%,223.5,0.0,2.0,21.125,0.0,0.0,7.9104,40.25
50%,446.0,0.0,3.0,29.0,0.0,0.0,14.4542,56.0
75%,668.5,1.0,3.0,39.0,1.0,0.0,31.0,76.0
max,891.0,1.0,3.0,81.0,8.0,6.0,512.3292,160.0


**Missing data:**

In [41]:
df.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Embarked         2
new_col        177
dtype: int64

**Filtering:**

In [45]:
df[df["Age"] > 30]    # Return rows where age > 30

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Embarked,new_col
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,39.000000,1,0,PC 17599,71.2833,C,76.0
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,36.000000,1,0,113803,53.1000,S,70.0
4,5,0,3,"Allen, Mr. William Henry",male,36.000000,0,0,373450,8.0500,S,70.0
5,6,0,3,"Moran, Mr. James",male,30.699118,0,0,330877,8.4583,Q,
6,7,0,1,"McCarthy, Mr. Timothy J",male,55.000000,0,0,17463,51.8625,S,108.0
...,...,...,...,...,...,...,...,...,...,...,...,...
879,880,1,1,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",female,57.000000,0,1,11767,83.1583,C,112.0
881,882,0,3,"Markun, Mr. Johann",male,34.000000,0,0,349257,7.8958,S,66.0
885,886,0,3,"Rice, Mrs. William (Margaret Norton)",female,40.000000,0,5,382652,29.1250,Q,78.0
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,30.699118,1,2,W./C. 6607,23.4500,S,


**Sorting:**

In [47]:
df.sort_values("Age", ascending=False)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Embarked,new_col
630,631,1,1,"Barkworth, Mr. Algernon Henry Wilson",male,81.00,0,0,27042,30.0000,S,160.00
851,852,0,3,"Svensson, Mr. Johan",male,75.00,0,0,347060,7.7750,S,148.00
96,97,0,1,"Goldschmidt, Mr. George B",male,72.00,0,0,PC 17754,34.6542,C,142.00
493,494,0,1,"Artagaveytia, Mr. Ramon",male,72.00,0,0,PC 17609,49.5042,C,142.00
116,117,0,3,"Connors, Mr. Patrick",male,71.50,0,0,370369,7.7500,Q,141.00
...,...,...,...,...,...,...,...,...,...,...,...,...
831,832,1,2,"Richards, Master. George Sibley",male,1.83,1,1,29106,18.7500,S,1.66
469,470,1,3,"Baclini, Miss. Helene Barbara",female,1.75,2,1,2666,19.2583,C,1.50
644,645,1,3,"Baclini, Miss. Eugenie",female,1.75,2,1,2666,19.2583,C,1.50
755,756,1,2,"Hamalainen, Master. Viljo",male,1.67,1,1,250649,14.5000,S,1.34


## PART 9: Series vs DataFrame Recap

| Feature | Series | DataFrame |
| --- | --- | --- |
| Dimensionality | 1D | 2D |
| Structure | Indexed array | Table (rows and columns) |
| Use case in ML | Labels, targets | Dataset (features & targets) |
| Access methods | s[0], s["label"] | df["col"], df.loc[] |