# Introduction to Pandas

Pandas is a Python library used for data manipulation and analysis. It provides fast, flexible, and easy-to-use data structures for working with structured data.

---

## Why Use Pandas
- Handles tabular data efficiently
- Supports missing data
- Provides powerful indexing and filtering
- Integrates with NumPy and Matplotlib

---

## Importing Pandas

In [138]:
import numpy as np
import pandas as pd

## Series from lists

In [139]:
# string
country = ['India', 'Pakistan', 'USA', 'Nepal', 'Srilanka']

pd.Series(country)

0       India
1    Pakistan
2         USA
3       Nepal
4    Srilanka
dtype: object

In [140]:
# integers
runs = [13,24,56, 78,100]

runs_ser = pd.Series(runs)

In [141]:
# custom index
marks = [67,57,89,100]
subjects = ['maths', 'english', 'science', 'hindi']

pd.Series(marks, index=subjects)

maths       67
english     57
science     89
hindi      100
dtype: int64

In [142]:
# setting a name
marks = pd.Series(marks, index=subjects, name="Danish ke marks")

marks

maths       67
english     57
science     89
hindi      100
Name: Danish ke marks, dtype: int64

## Series from dict

In [143]:
marks = {
    'maths':67,
    'english':57,
    'science':89,
    'hindi':100
}
marks_series = pd.Series(marks,name='danish ke marks')
marks_series

maths       67
english     57
science     89
hindi      100
Name: danish ke marks, dtype: int64

## Series Attributes

In [144]:
# size
marks_series.size

4

In [145]:
#dtype
marks_series.dtype

dtype('int64')

In [146]:
# name
marks_series.name

'danish ke marks'

In [147]:
# is_unique
marks_series.is_unique

pd.Series([1,1,2,3,4,5]).is_unique

False

In [148]:
# index
marks_series.index

Index(['maths', 'english', 'science', 'hindi'], dtype='object')

In [149]:
runs_ser.index

RangeIndex(start=0, stop=5, step=1)

In [150]:
# values
marks_series.values

array([ 67,  57,  89, 100])

In [151]:
type(marks_series.values)

numpy.ndarray

## Series using read_csv
`.squeeze()`
 - Pandas often gives you a DataFrame when you logically want a Series.
 - To remove unnecessary dimensions and get a simpler object.

In [152]:
# with one col
subs = pd.read_csv(r'D:\Code Playground\Python for ML\DSMP_2\Pandas\dataset-session-16\subs.csv')
type(subs)

pandas.core.frame.DataFrame

In [153]:
# subs = pd.read_csv(r'D:\Code Playground\Python for ML\DSMP_2\Pandas\dataset-session-16\subs.csv').squeeze() # .squeeze() used to
# type(subs)

In [154]:
subs.squeeze()

0       48
1       57
2       40
3       43
4       44
      ... 
360    231
361    226
362    155
363    144
364    172
Name: Subscribers gained, Length: 365, dtype: int64

In [155]:
type(subs)

pandas.core.frame.DataFrame

In [156]:
# with two col
kohli = pd.read_csv(r"D:\Code Playground\Python for ML\DSMP_2\Pandas\dataset-session-16\kohli_ipl.csv", index_col='match_no').squeeze()
type(kohli)

pandas.core.series.Series

In [157]:
kohli

match_no
1       1
2      23
3      13
4      12
5       1
       ..
211     0
212    20
213    73
214    25
215     7
Name: runs, Length: 215, dtype: int64

In [158]:
movies = pd.read_csv(r"D:\Code Playground\Python for ML\DSMP_2\Pandas\dataset-session-16\bollywood.csv", index_col='movie').squeeze()
type(bollywood)

pandas.core.series.Series

In [159]:
movies

movie
Uri: The Surgical Strike                   Vicky Kaushal
Battalion 609                                Vicky Ahuja
The Accidental Prime Minister (film)         Anupam Kher
Why Cheat India                            Emraan Hashmi
Evening Shadows                         Mona Ambegaonkar
                                              ...       
Hum Tumhare Hain Sanam                    Shah Rukh Khan
Aankhen (2002 film)                     Amitabh Bachchan
Saathiya (film)                             Vivek Oberoi
Company (film)                                Ajay Devgn
Awara Paagal Deewana                        Akshay Kumar
Name: lead, Length: 1500, dtype: object

In [160]:
subs.head() # top 5

Unnamed: 0,Subscribers gained
0,48
1,57
2,40
3,43
4,44


In [161]:
kohli.head(10)

match_no
1      1
2     23
3     13
4     12
5      1
6      9
7     34
8      0
9     21
10     3
Name: runs, dtype: int64

In [162]:
kohli.tail # bottom 5

<bound method NDFrame.tail of match_no
1       1
2      23
3      13
4      12
5       1
       ..
211     0
212    20
213    73
214    25
215     7
Name: runs, Length: 215, dtype: int64>

In [163]:
# sample -> gives random values
movies.sample() 

movie
Love Sonia    Abhishek Bharate
Name: lead, dtype: object

In [164]:
movies.sample(5) # random 5

movie
Bhagmati (2005 film)                 Tabu
Filmistaan                  Sharib Hashmi
Tanu Weds Manu: Returns    Kangana Ranaut
My Wife's Murder              Anil Kapoor
Ek Thi Rani Aisi Bhi          Hema Malini
Name: lead, dtype: object

In [165]:
# value_counts -> movies
movies.value_counts()

lead
Akshay Kumar            48
Amitabh Bachchan        45
Ajay Devgn              38
Salman Khan             31
Sanjay Dutt             26
                        ..
Tanishaa Mukerji         1
Tanuja                   1
Ankit                    1
Rakhee Gulzar            1
Geetika Vidya Ohlyan     1
Name: count, Length: 566, dtype: int64

In [166]:
kohli.sort_values() # temporary changes

match_no
8        0
87       0
93       0
91       0
206      0
      ... 
164    100
120    100
123    108
126    109
128    113
Name: runs, Length: 215, dtype: int64

In [167]:
kohli.sort_values(ascending=False)

match_no
128    113
126    109
123    108
120    100
164    100
      ... 
93       0
130      0
206      0
207      0
211      0
Name: runs, Length: 215, dtype: int64

In [168]:
kohli.sort_values(ascending=False).head(1).values

array([113])

In [169]:
kohli.sort_values(inplace=True)

ValueError: This Series is a view of some other array, to sort in-place you must create a copy

In [170]:
# sort)index -> inplace -> movies
movies.sort_index(inplace=True)
movies

movie
1920 (film)                   Rajniesh Duggall
1920: London                     Sharman Joshi
1920: The Evil Returns             Vicky Ahuja
1971 (2007 film)                Manoj Bajpayee
2 States (2014 film)              Arjun Kapoor
                                   ...        
Zindagi 50-50                      Veena Malik
Zindagi Na Milegi Dobara        Hrithik Roshan
Zindagi Tere Naam           Mithun Chakraborty
Zokkomon                       Darsheel Safary
Zor Lagaa Ke...Haiya!            Meghan Jadhav
Name: lead, Length: 1500, dtype: object