# Basic Objects
# Creating a Series

In [1]:
import pandas as pd

s = pd.Series([10, 20, 30])
s

0    10
1    20
2    30
dtype: int64

# Creating a DataFrame

In [2]:
df = pd.DataFrame({ "name" : ['A', 'B', 'C'], "age" : [21, 22, 24]})
df

Unnamed: 0,name,age
0,A,21
1,B,22
2,C,24


### 1. Data Importing (I/O)

For example tet's take a titanic dataset from kaggle

In [3]:
df = pd.read_csv('Titanic-Dataset.csv')

### 2. Data Inspection

Once the data is loaded, learn how to understand the dataset.

# Basic Data Inspection Functions in Pandas

| Function | What It Does | When to Use |
|----------|--------------|--------------|
| `df.head()` | Shows the **first 5 rows** of the DataFrame | When you want a quick look at how your dataset starts (columns, first values) |
| `df.tail()` | Shows the **last 5 rows** | When checking the ending values or verifying loading issues at the bottom |
| `df.info()` | Displays **column names, non-null counts, data types, memory usage** | Best for checking missing values, data types, and overall structure |
| `df.describe()` | Gives **summary statistics** for numeric columns (mean, std, min, max…) | For quick statistical overview during EDA (Exploratory Data Analysis) |
| `df.shape` | Returns the **(rows, columns)** count | To know dataset size; especially before and after cleaning |
| `df.columns` | Lists all column names | Helps in selecting, renaming, and filtering columns |
| `df.dtypes` | Shows the **data type** of each column | Useful for converting types (e.g., object → int, str → datetime) |



In [4]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [5]:
df.tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


In [7]:
df.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


In [8]:
df.shape

(891, 12)

In [9]:
df.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [10]:
df.dtypes

PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object

### 3. Selection & Filtering

Learn how to pick rows/columns or filter data using conditions.

In [11]:
df['Fare']

0       7.2500
1      71.2833
2       7.9250
3      53.1000
4       8.0500
        ...   
886    13.0000
887    30.0000
888    23.4500
889    30.0000
890     7.7500
Name: Fare, Length: 891, dtype: float64

In [12]:
df[['Pclass', 'Sex']]

Unnamed: 0,Pclass,Sex
0,3,male
1,1,female
2,3,female
3,1,female
4,3,male
...,...,...
886,2,male
887,1,female
888,3,female
889,1,male


#### 1. df.loc[ ] — Label-based selection

loc selects rows using labels (row names/index labels).

Example:

In [13]:
df.loc[0]   # first row

PassengerId                          1
Survived                             0
Pclass                               3
Name           Braund, Mr. Owen Harris
Sex                               male
Age                               22.0
SibSp                                1
Parch                                0
Ticket                       A/5 21171
Fare                              7.25
Cabin                              NaN
Embarked                             S
Name: 0, dtype: object

- This selects the row whose index label is 0.
- If your index is custom (like names or dates), loc will still use the label.

Use loc when:
- Your index is NOT numbers
- You named your index (like dates, IDs)
- You want to filter by row label
- You want to select both rows & columns

Example with columns:

In [14]:
df.loc[0, 'Age']

22.0

#### 2. df.iloc[ ] — Position-based selection

iloc selects rows using integer positions (like list indexing).

Example:

In [15]:
df.iloc[0]

PassengerId                          1
Survived                             0
Pclass                               3
Name           Braund, Mr. Owen Harris
Sex                               male
Age                               22.0
SibSp                                1
Parch                                0
Ticket                       A/5 21171
Fare                              7.25
Cabin                              NaN
Embarked                             S
Name: 0, dtype: object

This selects the first row, no matter what the index label is.

Use iloc when:

- You want to access row number 0, 1, 2...
- You don’t care about labels
- You want slicing by range

In [16]:
df.iloc[0:5]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


### 3. Filtering using conditions

This selects rows where the condition is True.

Use when:

- You want to filter rows
- You want only data that matches a condition

#### Numeric filtering

In [17]:
df[df['Age'] > 70]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
96,97,0,1,"Goldschmidt, Mr. George B",male,71.0,0,0,PC 17754,34.6542,A5,C
116,117,0,3,"Connors, Mr. Patrick",male,70.5,0,0,370369,7.75,,Q
493,494,0,1,"Artagaveytia, Mr. Ramon",male,71.0,0,0,PC 17609,49.5042,,C
630,631,1,1,"Barkworth, Mr. Algernon Henry Wilson",male,80.0,0,0,27042,30.0,A23,S
851,852,0,3,"Svensson, Mr. Johan",male,74.0,0,0,347060,7.775,,S


This returns only the rows where age is greater than 70.

Use this for:

- Marks, Salary, Scores, Temperature, Any numeric comparison

####  Filtering categorical conditions

Filters rows where Sex is exactly "female".

Example:

In [18]:
df[df['Sex'] == 'female']

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C
...,...,...,...,...,...,...,...,...,...,...,...,...
880,881,1,2,"Shelley, Mrs. William (Imanita Parrish Hall)",female,25.0,0,1,230433,26.0000,,S
882,883,0,3,"Dahlberg, Miss. Gerda Ulrika",female,22.0,0,0,7552,10.5167,,S
885,886,0,3,"Rice, Mrs. William (Margaret Norton)",female,39.0,0,5,382652,29.1250,,Q
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S


This is useful for:

- Filtering by category
- Filtering strings
- Getting specific groups (city, gender, department, etc.)

# Summary: loc, iloc, and Different Types of Filtering

| Command | What it Does | When to Use |
|--------|---------------|-------------|
| `df.loc[]` | Selects rows **by label** | When your index has names (like dates, IDs), or when selecting both rows and columns |
| `df.iloc[]` | Selects rows **by position** | When selecting using row numbers (0,1,2...) |
| `df[df[col] > x]` | Filters rows using a **numeric condition** | When filtering age, marks, salary, temperature, etc. |
| `df[df[col] == value]` | Filters **categorical/string values** | When selecting rows with matching categories (city, gender, department, etc.) |
| `df[df[col] >= x]` | Filters using a **threshold** | When selecting top-scoring, high-salary, or above-average values |

