## 🐼Introduction to Pandas
```bash
Pandas is a powerful open-source Python library used for data manipulation, analysis, and cleaning.
It is built on top of NumPy and provides two key data structures:
```
- Series – One-dimensional labeled array.

- DataFrame – Two-dimensional labeled data structure (like a spreadsheet or SQL table).

✅ Key Benefits:
- Handles structured data (rows and columns) easily.

- Supports missing data handling.

- Allows easy data filtering, aggregation, reshaping, and visualization-ready transformation.

- Built-in methods for reading/writing data from/to files (CSV, Excel, SQL, etc.)
```bash
📦 Internal Engine:
Pandas is built on top of NumPy, which makes it fast and efficient for numerical and matrix operations, while adding labels and structure for real-world datasets.



### Series in pandas
- It's one dimensional labelled array

In [2]:
import pandas as pd 

s1 = pd.Series([10 , 20 , 30, 40 , 50 ])

print(s1)

0    10
1    20
2    30
3    40
4    50
dtype: int64


- Labelled index 

In [7]:
s2 = pd.Series([100 , 200 , 300, 400 , 500 ], index=['a', 'b', 'c', 'd', 'e'],name = "My Series")
print(s2)

a    100
b    200
c    300
d    400
e    500
Name: My Series, dtype: int64


- Create series from DICTIONARY

In [8]:
data = {'a': 100, 'b': 200, 'c': 300, 'd': 400, 'e': 500}

s3 = pd.Series(data)

print(s3)

a    100
b    200
c    300
d    400
e    500
dtype: int64


- Accessing index's and values 

In [14]:
print(s2.index[1])
print(s2.values[1])

print(s3.dtype)

print(s2.name)

print(s2.shape)

b
200
int64
My Series
(5,)


### Series Indexing and Slicing
- Positional index

In [11]:
s4 = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'], name='New Series')

print(s4.values[0])

print(s4.index[0])

1
a


- positional indexing using `iloc[]`

In [10]:
print(s4.iloc[0])

1


- Accessing using `loc[]`

In [13]:
print(s4.loc['b'])

2


## Slicing
- Using `iloc[]` Excludes last index meaning it's exclusive

In [19]:
print(s4.iloc[0:2])

a    1
b    2
Name: New Series, dtype: int64


- Using `loc[]` Includes last index meaning inclusive

In [20]:
print(s4.loc['a':'c'])

a    1
b    2
c    3
Name: New Series, dtype: int64


# 🐼 Pandas DataFrame
## A DataFrame is a 2-dimensional, tabular data structure with labeled rows and columns — think of it like an Excel sheet or a SQL table.

---

### ✅ Key Features:
- Built on top of NumPy & Pandas Series.

- Stores structured/relational data in formats like .csv, .json, .xlsx, SQL tables, etc.

- Allows data manipulation, cleaning, filtering, aggregation, and visualization support.

- Columns can be accessed like dictionaries or attributes.

In [21]:
data = {
    'Name' : ['Alice', 'Bob', 'Charlie'],
    'Age' : [25, 30, 35],
    'City' : ['New York', 'Los Angeles', 'Chicago']
}


In [23]:
df = pd.DataFrame(data)

print(df)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


In [24]:
print(df['Name'])
print("="*40)
print(df[['Name', 'Age']])
print("="*40)
print(df.loc[0])
print("="*40)
print(df.loc[0:1]) # Inclusive
print("="*40)
print(df.iloc[0])
print("="*40)
print(df.iloc[0:1]) # Exclusive

0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35
Name       Alice
Age           25
City    New York
Name: 0, dtype: object
    Name  Age         City
0  Alice   25     New York
1    Bob   30  Los Angeles
Name       Alice
Age           25
City    New York
Name: 0, dtype: object
    Name  Age      City
0  Alice   25  New York


## Working with real Data set

In [4]:
df = pd.read_csv('D:\Btech_CS\Python\Pandas\day22\Titanic-Dataset.csv')

  df = pd.read_csv('D:\Btech_CS\Python\Pandas\day22\Titanic-Dataset.csv')


In [5]:
# preview of the dataframe

# 1st 5 rows

df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [6]:
# last 5 rows

df.tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


In [14]:
df['Name']

0                                Braund, Mr. Owen Harris
1      Cumings, Mrs. John Bradley (Florence Briggs Th...
2                                 Heikkinen, Miss. Laina
3           Futrelle, Mrs. Jacques Heath (Lily May Peel)
4                               Allen, Mr. William Henry
                             ...                        
886                                Montvila, Rev. Juozas
887                         Graham, Miss. Margaret Edith
888             Johnston, Miss. Catherine Helen "Carrie"
889                                Behr, Mr. Karl Howell
890                                  Dooley, Mr. Patrick
Name: Name, Length: 891, dtype: object

In [16]:
# survived column
df['Survived']


0      0
1      1
2      1
3      1
4      0
      ..
886    0
887    1
888    0
889    1
890    0
Name: Survived, Length: 891, dtype: int64

In [17]:
df[['Name','Age']]

Unnamed: 0,Name,Age
0,"Braund, Mr. Owen Harris",22.0
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0
2,"Heikkinen, Miss. Laina",26.0
3,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0
4,"Allen, Mr. William Henry",35.0
...,...,...
886,"Montvila, Rev. Juozas",27.0
887,"Graham, Miss. Margaret Edith",19.0
888,"Johnston, Miss. Catherine Helen ""Carrie""",
889,"Behr, Mr. Karl Howell",26.0


In [18]:
df.loc[0] # first row

PassengerId                          1
Survived                             0
Pclass                               3
Name           Braund, Mr. Owen Harris
Sex                               male
Age                               22.0
SibSp                                1
Parch                                0
Ticket                       A/5 21171
Fare                              7.25
Cabin                              NaN
Embarked                             S
Name: 0, dtype: object

In [19]:
df.loc[0, 'Name'] # name of the passenger at index 0

'Braund, Mr. Owen Harris'

In [20]:
#slice

df.iloc[0:4]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S


## Conditional Selection

In [21]:
# get all the rows where person age is greater than 70

# df[condition]
df[df['Age'] > 70]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
96,97,0,1,"Goldschmidt, Mr. George B",male,71.0,0,0,PC 17754,34.6542,A5,C
116,117,0,3,"Connors, Mr. Patrick",male,70.5,0,0,370369,7.75,,Q
493,494,0,1,"Artagaveytia, Mr. Ramon",male,71.0,0,0,PC 17609,49.5042,,C
630,631,1,1,"Barkworth, Mr. Algernon Henry Wilson",male,80.0,0,0,27042,30.0,A23,S
851,852,0,3,"Svensson, Mr. Johan",male,74.0,0,0,347060,7.775,,S


In [22]:
#Get all the name of the passengers who survived
df[df['Survived'] == 1]['Name']

1      Cumings, Mrs. John Bradley (Florence Briggs Th...
2                                 Heikkinen, Miss. Laina
3           Futrelle, Mrs. Jacques Heath (Lily May Peel)
8      Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
9                    Nasser, Mrs. Nicholas (Adele Achem)
                             ...                        
875                     Najib, Miss. Adele Kiamie "Jane"
879        Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)
880         Shelley, Mrs. William (Imanita Parrish Hall)
887                         Graham, Miss. Margaret Edith
889                                Behr, Mr. Karl Howell
Name: Name, Length: 342, dtype: object

In [33]:
# passengers who survived and age is less than 20 

df[df['Survived'] == 1]['Name']

1      Cumings, Mrs. John Bradley (Florence Briggs Th...
2                                 Heikkinen, Miss. Laina
3           Futrelle, Mrs. Jacques Heath (Lily May Peel)
8      Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
9                    Nasser, Mrs. Nicholas (Adele Achem)
                             ...                        
875                     Najib, Miss. Adele Kiamie "Jane"
879        Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)
880         Shelley, Mrs. William (Imanita Parrish Hall)
887                         Graham, Miss. Margaret Edith
889                                Behr, Mr. Karl Howell
Name: Name, Length: 342, dtype: object

In [36]:
# total number of survivors

survived = len(df[df['Survived'] == 1])
print("people survived : ", survived)

notsurvived = len(df[df['Survived'] == 0])
print("people notsurvived : ", notsurvived)

print(notsurvived + survived)

people survived :  342
people notsurvived :  549
891


In [37]:
survivalpercentage = (survived / (notsurvived + survived)) * 100
print("Survival Percentage : ", survivalpercentage)

print("="*40)

notsurvivedpercentage = (notsurvived / (notsurvived + survived)) * 100
print("Not Survival Percentage : ", notsurvivedpercentage)
print("="*40)


Survival Percentage :  38.38383838383838
Not Survival Percentage :  61.61616161616161
