# Introduction to Pandas

Pandas is a powerful open-source Python library for data manipulation and analysis. It is one of the most popular libraries used in data science and machine learning. Pandas provides two main data structures:

- Series: A one-dimensional array-like object.
- DataFrame: A two-dimensional, tabular data structure with labeled axes (rows and columns).

Pandas makes it easy to:

- Load and explore data.
- Clean and transform data.
- Perform statistical analysis.

We learn Pandas in data collection and preprocessing because it:

- Simplifies working with tabular datasets.
- Provides robust tools for cleaning, transforming, and analyzing data.
- Bridges raw data and machine learning pipelines.
- Seamlessly integrates with other Python libraries for a smooth data science workflow.
- Mastering Pandas is essential for efficiently handling real-world data challenges in data science.

## Installing and Importing Pandas

In [None]:
#!pip install pandas Install Pandas (if not already installed)
import pandas as pd

## Pandas Data Structures

### Pandas Series
A Series is like a column in an Excel spreadsheet or a one-dimensional array. It can hold data of any type.

- Designed for data manipulation and analysis.
- More useful when working with labeled data or data that requires an index (e.g., time series data).
- Ideal for working with data where you need to apply filtering, aggregation, or statistical operations.

In [None]:

data =  [a**2  for a in range(20)]
print(data)
labels = ['a', 'b', 'c', 'd', 'e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t']
series = pd.Series(data, dtype='int',index=labels)
print("Pandas Series:")
print(series)
#series = pd.Series(data, dtype='float',index=labels,columns=['Fabionacci Series'])
#print("Pandas Series:")
#print(series

#print("Pandas Series:")
#print(series)
#series['f']=series['c']+series['d']
#series


[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361]
Pandas Series:
a      0
b      1
c      4
d      9
e     16
f     25
g     36
h     49
i     64
j     81
k    100
l    121
m    144
n    169
o    196
p    225
q    256
r    289
s    324
t    361
dtype: int64


In [None]:
# prompt: fabionacci sequence using list comprehension

def fibonacci_sequence(n):
  """
  Generates a Fibonacci sequence up to n terms using list comprehension.
  """
  a, b = 0, 1
  return [a if i == 0 else b if i == 1 else (a + b) for i in range(n)]

# Example usage:
n = 20  # Change this to generate a different number of terms
fib_seq = fibonacci_sequence(n)
fib_seq


[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

In [None]:
#create fabiinacci sequence
a=0
b=1
for i in range(20):
    c=a+b
    a=b
    b=c
    print(c)

SyntaxError: invalid syntax (<ipython-input-17-77f66d9fe185>, line 2)

The left-hand side represents the index (starting from 0).
The right-hand side represents the values in the Series.

In [None]:
s1 = pd.Series(['Muhammad', 'Afnan', 'Ahmed','Bhutta'])
s1

Unnamed: 0,0
0,Muhammad
1,Afnan
2,Ahmed
3,Bhutta


In [None]:
# Accessing Elements from series
s1[2]

'Ahmed'

In [None]:
print(type(s1))

<class 'pandas.core.series.Series'>


In [None]:
s2 = pd.Series([12,24,36,12,6,12,24], index=['apples','oranges','bananas', 'samosas','rolls', 'chickens', 'coldrinks'])
# labels / string indices / costomize index
s2

Unnamed: 0,0
apples,12
oranges,24
bananas,36
samosas,12
rolls,6
chickens,12
coldrinks,24


In [None]:
s2.index

Index(['apples', 'oranges', 'bananas', 'samosas', 'rolls', 'chickens',
       'coldrinks'],
      dtype='object')

In [None]:
s2.values

array([12, 24, 36, 12,  6, 12, 24])

In [None]:
s2['apples']

12

In [None]:

s2[2]

  s2[2]


36

In [None]:
s2*2

Unnamed: 0,0
apples,24
oranges,48
bananas,72
samosas,24
rolls,12
chickens,24
coldrinks,48


In [None]:
import pandas as pd
import numpy as np

### Pandas DataFrame

A DataFrame is a two-dimensional data structure, similar to an Excel spreadsheet or SQL table. It consists of rows and columns, where each column can be a different data type.

In [None]:
# Creating a DataFrame from a dictionary
import pandas as pd
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'San Francisco', 'Los Angeles']
}
i= ['a','b','c']
df = pd.DataFrame(data,index=i)

print("Pandas DataFrame:\n",df['Name']['c'])



Pandas DataFrame:
 Charlie


Each key in the dictionary represents a column name.
Each value is a list that forms the data in that column.

In [None]:
# Creating a data frame
result = {'name':          ['Alpha','Bravo', 'Charlie','Delta','Echo','Foxtrot','Golf','Hotel','India','Juliet'],
          'roll':          [11,22,33,44,55,66,77,88,99,111],
          'python':        [78,67,89,90,91,72,76,89,67,85],
          'excel':         [89,87,67,74,78,90,76,78,90,54],
           'power_bi':     [45,67,89,87,67,90,65,67,56,90],
          'pandas':        [81,85,89,87,41,45,96,56,93,78],
        'machine_learning':[96,95,84,85,82,81,74,75,96,90],
         'statistics':     [67,48,45,98,75,71,70,60,90,93]}

df1 = pd.DataFrame(result)
df1


Unnamed: 0,name,roll,python,excel,power_bi,pandas,machine_learning,statistics
0,Alpha,11,78,89,45,81,96,67
1,Bravo,22,67,87,67,85,95,48
2,Charlie,33,89,67,89,89,84,45
3,Delta,44,90,74,87,87,85,98
4,Echo,55,91,78,67,41,82,75
5,Foxtrot,66,72,90,90,45,81,71
6,Golf,77,76,76,65,96,74,70
7,Hotel,88,89,78,67,56,75,60
8,India,99,67,90,56,93,96,90
9,Juliet,111,85,54,90,78,90,93


In [None]:
type(df1)

Basic DataFrame Operations

In [None]:
# Basic operations
print(df1)
print("Shape of DataFrame:", df1.shape)
print("\nSummary statistics:")
print(df.describe())


      name  roll  python  excel  power_bi  pandas  machine_learning  \
0    Alpha    11      78     89        45      81                96   
1    Bravo    22      67     87        67      85                95   
2  Charlie    33      89     67        89      89                84   
3    Delta    44      90     74        87      87                85   
4     Echo    55      91     78        67      41                82   
5  Foxtrot    66      72     90        90      45                81   
6     Golf    77      76     76        65      96                74   
7    Hotel    88      89     78        67      56                75   
8    India    99      67     90        56      93                96   
9   Juliet   111      85     54        90      78                90   

   statistics  
0          67  
1          48  
2          45  
3          98  
4          75  
5          71  
6          70  
7          60  
8          90  
9          93  
Shape of DataFrame: (10, 8)

Summary stati

In [None]:
# Basic operations
#df1.head() #Displays the first few rows of the DataFrame.
#df1.tail()# Displays the last few rows.
# df.shape: Shows the number of rows and columns.
df1.columns# Displays the column names.
#df.describe(): Generates summary statistics for numerical columns.
#print(df1)
#print("Shape of DataFrame:", df1.shape)
#print("\nSummary statistics:")
print(df1.describe())


             roll    python      excel   power_bi     pandas  \
count   10.000000  10.00000  10.000000  10.000000  10.000000   
mean    60.600000  80.40000  78.300000  72.300000  75.100000   
std     33.470385   9.59398  11.576317  15.881855  20.184978   
min     11.000000  67.00000  54.000000  45.000000  41.000000   
25%     35.750000  73.00000  74.500000  65.500000  61.500000   
50%     60.500000  81.50000  78.000000  67.000000  83.000000   
75%     85.250000  89.00000  88.500000  88.500000  88.500000   
max    111.000000  91.00000  90.000000  90.000000  96.000000   

       machine_learning  statistics  
count         10.000000   10.000000  
mean          85.800000   71.700000  
std            8.216515   18.037307  
min           74.000000   45.000000  
25%           81.250000   61.750000  
50%           84.500000   70.500000  
75%           93.750000   86.250000  
max           96.000000   98.000000  


In [None]:
df1.sample()

Unnamed: 0,name,roll,python,excel,power_bi,pandas,machine_learning,statistics
6,Golf,77,76,76,65,96,74,70


In [None]:
df1.columns

Index(['name', 'roll', 'python', 'excel', 'power_bi', 'pandas',
       'machine_learning', 'statistics'],
      dtype='object')

In [None]:
df1.columns = ['Name','Roll','Python','Excel','Power_bi','Pandas','Machine_learning','Statistics']
df1.columns

Index(['Name', 'Roll', 'Python', 'Excel', 'Power_bi', 'Pandas',
       'Machine_learning', 'Statistics'],
      dtype='object')

In [None]:
df1

Unnamed: 0,Name,Roll,Python,Excel,Power_bi,Pandas,Machine_learning,Statistics
0,Alpha,11,78,89,45,81,96,67
1,Bravo,22,67,87,67,85,95,48
2,Charlie,33,89,67,89,89,84,45
3,Delta,44,90,74,87,87,85,98
4,Echo,55,91,78,67,41,82,75
5,Foxtrot,66,72,90,90,45,81,71
6,Golf,77,76,76,65,96,74,70
7,Hotel,88,89,78,67,56,75,60
8,India,99,67,90,56,93,96,90
9,Juliet,111,85,54,90,78,90,93


In [None]:
dff= df1.rename(columns={'Roll':'ID', "Machine_learning":"ML","Statistics":"Stats"})
dff

Unnamed: 0,Name,ID,Python,Excel,Power_bi,Pandas,ML,Stats
0,Alpha,11,78,89,45,81,96,67
1,Bravo,22,67,87,67,85,95,48
2,Charlie,33,89,67,89,89,84,45
3,Delta,44,90,74,87,87,85,98
4,Echo,55,91,78,67,41,82,75
5,Foxtrot,66,72,90,90,45,81,71
6,Golf,77,76,76,65,96,74,70
7,Hotel,88,89,78,67,56,75,60
8,India,99,67,90,56,93,96,90
9,Juliet,111,85,54,90,78,90,93


In [None]:
df1


Unnamed: 0,Name,Roll,Python,Excel,Power_bi,Pandas,Machine_learning,Statistics
0,Alpha,11,78,89,45,81,96,67
1,Bravo,22,67,87,67,85,95,48
2,Charlie,33,89,67,89,89,84,45
3,Delta,44,90,74,87,87,85,98
4,Echo,55,91,78,67,41,82,75
5,Foxtrot,66,72,90,90,45,81,71
6,Golf,77,76,76,65,96,74,70
7,Hotel,88,89,78,67,56,75,60
8,India,99,67,90,56,93,96,90
9,Juliet,111,85,54,90,78,90,93


In [None]:
df1.rename(columns={'Roll':'Id', 'Statistics':"Stats"},inplace=True)

In [None]:
df1

Unnamed: 0,Name,Id,Python,Excel,Power_bi,Pandas,Machine_learning,Stats
0,Alpha,11,78,89,45,81,96,67
1,Bravo,22,67,87,67,85,95,48
2,Charlie,33,89,67,89,89,84,45
3,Delta,44,90,74,87,87,85,98
4,Echo,55,91,78,67,41,82,75
5,Foxtrot,66,72,90,90,45,81,71
6,Golf,77,76,76,65,96,74,70
7,Hotel,88,89,78,67,56,75,60
8,India,99,67,90,56,93,96,90
9,Juliet,111,85,54,90,78,90,93


Selecting Data in a DataFrame

In [None]:
# Selecting a specific column
print(df)
ages = df['Age']
print("Ages of individuals:")
print(ages)



      Name  Age           City
0    Alice   25       New York
1      Bob   30  San Francisco
2  Charlie   35    Los Angeles
Ages of individuals:
0    25
1    30
2    35
Name: Age, dtype: int64


In [None]:
# Selecting rows where Age is greater than 30
older_people = df[df['Age'] > 30]
print("Individuals with Age greater than 30:")
print(older_people)


Individuals with Age greater than 30:
      Name  Age         City
2  Charlie   35  Los Angeles


Adding, Modifying, and Removing Data

In [None]:
# Adding a new column
df['Salary'] = [50000, 60000, 70000]
print("DataFrame with Salary column added:")
print(df)


DataFrame with Salary column added:
      Name  Age           City  Salary
0    Alice   25       New York   50000
1      Bob   30  San Francisco   60000
2  Charlie   35    Los Angeles   70000


In [None]:
# Modifying the values of the 'Age' column
df['Age'] = df['Age'] + 1
print("Updated Ages:")
print(df)

Updated Ages:
      Name  Age           City  Salary
0    Alice   26       New York   50000
1      Bob   31  San Francisco   60000
2  Charlie   36    Los Angeles   70000


In [None]:
# # Removing a column

# The axis parameter specifies whether you want to remove rows or columns.
    # axis=0 refers to rows.
    # axis=1 refers to columns.
# Without specifying axis=1, Pandas would assume you're trying to drop a row (since axis=0 is the default for many operations
df = df.drop('Salary', axis=1)
#print("DataFrame after removing Salary column:")
print(df)


      Name  Age           City
0    Alice   26       New York
1      Bob   31  San Francisco
2  Charlie   36    Los Angeles


In [None]:
# Dropping a row (row index 1)
df = df.drop(1, axis=0)

print(df)


      Name  Age         City
0    Alice   26     New York
2  Charlie   36  Los Angeles
