# Python Pandas & NumPy Data Analysis Notebook
This notebook covers data exploration, manipulation, and analysis using `pandas` and `numpy`. 
We will learn:
- Reading CSV files
- Inspecting DataFrames
- Working with Series
- Indexing, slicing, and filtering
- Boolean indexing
- Multi-indexing
- DataFrame operations like adding/dropping columns and rows
- Basic plotting with `matplotlib`

This notebook is designed for **hands-on practice** with auto-generated CSV data.

In [2]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

## Reading CSV File
- `pd.read_csv()` is used to read CSV files into a DataFrame.
- Make sure your file exists at the path provided.
- `.head()` shows the first 5 rows by default; you can pass a number to see more rows.


In [5]:
titanic_train = pd.read_csv(r"C:\Users\rahul\OneDrive\Desktop\pythonLearning\titanic_sample_100.csv")
titanic_train.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,1,3,Passenger_1,female,64.2,0,2,TCKT1000,278.82,D56,S
1,2,1,3,Passenger_2,male,70.3,0,0,TCKT1001,400.08,B78,Q
2,3,1,2,Passenger_3,female,24.9,2,0,TCKT1002,401.1,C123,C
3,4,0,3,Passenger_4,male,49.1,1,1,TCKT1003,247.1,E45,C
4,5,1,2,Passenger_5,female,7.7,2,0,TCKT1004,416.93,C123,Q


### Exploring the Dataset
- `.tail()` shows the last few rows.
- `.dtypes` displays the type of data in each column (`object` for text, `int64` or `float64` for numbers).
- Boolean comparison (`dtypes == 'object'`) helps select all categorical/text columns.


In [6]:
titanic_train.head(10)  # First 10 rows
titanic_train.tail(10)  # Last 10 rows
titanic_train.dtypes   # Column data types
titanic_train.dtypes == 'object'  # Boolean mask for object columns

PassengerId    False
Survived       False
Pclass         False
Name            True
Sex             True
Age            False
SibSp          False
Parch          False
Ticket          True
Fare           False
Cabin           True
Embarked        True
dtype: bool

### Object Columns
- Select only object (text/categorical) columns using Boolean indexing.
- `.describe()` on object columns shows count, unique values, most frequent value, and its frequency.

In [7]:
obj_cols = titanic_train.dtypes[titanic_train.dtypes == 'object'].index
obj_cols
titanic_train[obj_cols].describe()

Unnamed: 0,Name,Sex,Ticket,Cabin,Embarked
count,100,100,100,78,100
unique,100,2,100,4,3
top,Passenger_1,female,TCKT1000,B78,S
freq,1,52,1,23,46


### Numeric Column Descriptions
- `.describe()` on the entire DataFrame gives statistics for numeric columns: count, mean, std, min, max, and quartiles.

In [8]:
titanic_train.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,100.0,100.0,100.0,100.0,100.0,100.0,100.0
mean,50.5,0.52,2.04,41.066,0.99,1.06,258.7572
std,29.011492,0.502117,0.815506,22.378373,0.771919,0.850668,140.820608
min,1.0,0.0,1.0,1.1,0.0,0.0,10.48
25%,25.75,0.0,1.0,24.575,0.0,0.0,146.1175
50%,50.5,1.0,2.0,40.3,1.0,1.0,269.78
75%,75.25,1.0,3.0,60.575,2.0,2.0,357.58
max,100.0,1.0,3.0,78.6,2.0,2.0,495.54


### Column Operations
- Select single or multiple columns using `df['col']` or `df[['col1','col2']]`.
- Sorting column values with Python's built-in `sorted()`.
- `.describe()` works on a single column as well.
- `.iloc[start:end:step]` selects rows by index positions.

In [None]:
titanic_train[['Name','Sex','Age']]
sorted(titanic_train['Name'])[5:30:2]
titanic_train['Name'].describe()
titanic_train[['Ticket']][4:9:2]
titanic_train['Ticket'].describe()

### Adding New Columns
- Create new column with default value: `df['new_col'] = value`.
- Can also create columns based on calculations.

In [None]:
titanic_train['sudh'] = 'sdffs'
titanic_train.head()

### Working with `Cabin` Column
- Convert to string using `.astype(str)`.
- Extract first character to represent cabin section.
- Convert to categorical type using `pd.Categorical`.

In [None]:
char_cabin = titanic_train['Cabin'].astype(str)
new_Cabin = pd.Categorical([cabin[0] for cabin in char_cabin])
titanic_train['Cabin'] = new_Cabin
titanic_train.head()