![image.png](attachment:d82157f9-0ab5-4ad3-8d0b-edc4dd8f88d9.png)

[![Sc9Kj.gif](https://s11.gifyu.com/images/Sc9Kj.gif)]

👋 Hello, dear User! Welcome to this notebook, where we will explore the powerful tools of NumPy and Pandas in the realm of data manipulation and analysis. Whether you're a data enthusiast, a researcher, or a programmer, these libraries are essential companions in your journey through the world of data. 🚀

## What is NumPy? 🧮

NumPy, short for "Numerical Python," is a fundamental library for numerical computing in Python. It introduces the concept of N-dimensional arrays, which allows efficient handling of large datasets and enables lightning-fast mathematical operations. NumPy is the foundation of countless Python libraries, providing the backbone for scientific computing and data analysis. Its efficiency, ease of use, and extensive functionality make it a must-have tool for any data professional. 📊🔢

## What is Pandas? 🐼

Pandas is another influential library in the Python data ecosystem. It provides high-performance data structures, primarily the DataFrame, that are designed for data manipulation and analysis. With Pandas, you can easily clean, transform, and analyze datasets with ease. Its simplicity and intuitive syntax make it a popular choice for data wrangling tasks and working with tabular data. 📈📋

## Why are they required? ❓

NumPy and Pandas are essential in modern data analysis and scientific computing for various reasons:

- Efficient Numerical Computing: NumPy's array operations are optimized and considerably faster than traditional Python lists, making it ideal for heavy numerical computations. 🚀🚀

- N-Dimensional Data Handling: NumPy's multi-dimensional arrays enable the manipulation of large datasets effortlessly, which is crucial in data analysis. 📈🔢

- Comprehensive Data Analysis: Pandas simplifies data analysis tasks by providing powerful data structures like DataFrames that can handle diverse data types and perform complex operations efficiently. 📊🔍

- Data Cleaning and Transformation: Pandas offers an extensive set of functions to clean, reshape, and transform data, allowing you to prepare data for analysis effectively. 🧹🔄

## In this notebook: 📓

Throughout this notebook, we will delve into the vast capabilities of NumPy and Pandas. We'll explore more than 70 functions, covering a wide range of topics from array creation, slicing, and reshaping with NumPy to data manipulation, grouping, and merging with Pandas. By the end, you'll have a solid understanding of these libraries and be better equipped to tackle real-world data challenges. 🧠💪

Let's embark on this exciting journey together! If you have any questions or need assistance, feel free to reach out. Happy exploring and coding! 🚀🎉


# 📦 Importing Dependencies


In [231]:
!pip install numpy
!pip install pandas



![image.png](attachment:41a8cf79-79af-4e47-befd-ca80fbd7ba57.png)

# NumPy 🧮

In [232]:
import numpy as np

# Array ⚙️

In [233]:
# Creating array with list

lst = [1,2,3,4,5]
arr  = np.array(lst)

In [234]:
print(type(lst))
lst

<class 'list'>


[1, 2, 3, 4, 5]

In [235]:
print(type(arr))
arr

<class 'numpy.ndarray'>


array([1, 2, 3, 4, 5])

In [236]:
# Dimensions of the array
arr.ndim

1

In [237]:
# creating multi dimensions array

a=np.array([[1,2,3],[4,5,6],[7,8,9]])

In [238]:
print(a)
a.ndim

[[1 2 3]
 [4 5 6]
 [7 8 9]]


2

# Zeros 0️⃣


In [239]:
# Array of zero

zero = np.zeros((6,5))
zero

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

# Ones 1️⃣


In [240]:
# Array of one

ones = np.ones(5)
ones

array([1., 1., 1., 1., 1.])

# Empty 🈳


In [241]:
# Creating an empty array but by default it gives the last value stored in memory

np.empty(5)            

array([1., 1., 1., 1., 1.])

# Arrange 📏


In [242]:
# Array of odered elements

np.arange(5)

array([0, 1, 2, 3, 4])

# Eye 👁️


In [243]:
# Diognal elements like identity matrix

np.eye(6)  

array([[1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1.]])

In [244]:
np.eye(2,5)    

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.]])

# Linspace ↔️


In [245]:
# linespace between elements

np.linspace(0,20,num =5)

array([ 0.,  5., 10., 15., 20.])

# Random 🎲


In [246]:
# Random values between 0 to 1

np.random.rand(5,5) 

array([[0.27384319, 0.35116898, 0.97463033, 0.39331135, 0.70350852],
       [0.0802039 , 0.39185923, 0.06346904, 0.24598211, 0.12105553],
       [0.06572831, 0.68731381, 0.24116264, 0.31059402, 0.11298817],
       [0.57944578, 0.25410314, 0.41773626, 0.28188934, 0.44718718],
       [0.69685047, 0.76408653, 0.40040406, 0.36643406, 0.40574243]])

In [247]:
# Random Values between +1 to -1 close to 0

np.random.randn(5,5)     

array([[ 1.43393024,  0.52925217, -1.02038512,  0.9135285 ,  0.89143768],
       [ 0.94080995,  0.06183431,  0.75628901,  0.47459947,  0.3819635 ],
       [-1.2441598 , -0.54634661, -0.63261164,  0.83545718,  0.21181816],
       [ 0.83293946, -0.73282562, -0.06499845,  2.0221999 , -0.53941682],
       [-0.01008879, -1.50511073, -0.30996359,  0.29438973,  1.33367058]])

In [248]:
# Random vaues between 0 to 1, but 1 not included

np.random.ranf(4)   

array([0.4972821 , 0.50924891, 0.53919875, 0.42565837])

In [249]:
# Random value between a fixed range (1,n-1)

np.random.randint(1,10,5)  

array([4, 3, 9, 7, 9])

In [250]:
# By defaut generate integer array

innt = np.arange(4)
innt.dtype

dtype('int64')

# Typecasting 🔤


In [251]:
# Changing it to float

new = np.float32(innt)
new.dtype

dtype('float32')

In [252]:
new # Results

array([0., 1., 2., 3.], dtype=float32)

In [253]:
# Another method

new_s = new.astype(int)

In [254]:
new_s # Results

array([0, 1, 2, 3])

# Arithmetic Operations ➕➖✖️➗


In [255]:
a = np.array([1,2,3,4,5])

add = a+4

add     # each element added by 4 respectively for( +, - , * , /, **, %, 1/variable)

array([5, 6, 7, 8, 9])

In [256]:
# Simple addition in array

v1 = np.array([1,2,3])
v2 = np.array([1,2,3])

add= v1+v2
add

array([2, 4, 6])

In [257]:
a = np.array([[1,2,3],[4,5,6]])
b = np.array([[1,2,3],[4,5,6]])

In [258]:
# Numpy operations are faster 

print(np.add(a,b))
print()
print(np.subtract(a,b))
print()
print(np.multiply(a,b))
print()
print(np.divide(a,b))
print()
print(np.mod(a,b))
print()
print(np.power(a,b))
print()
print(np.reciprocal(a,b))

[[ 2  4  6]
 [ 8 10 12]]

[[0 0 0]
 [0 0 0]]

[[ 1  4  9]
 [16 25 36]]

[[1. 1. 1.]
 [1. 1. 1.]]

[[0 0 0]
 [0 0 0]]

[[    1     4    27]
 [  256  3125 46656]]

[[1 0 0]
 [0 0 0]]


# Statistical Function 📊


In [259]:
# basic function on numpy array

arr = np.array([1,2,3,5])

print("min: ", np.min(arr), "at index: ", np.argmin(arr))
print("max: ", np.max(arr), "at index: ", np.argmax(arr))
print("sqrt: ", np.sqrt(arr))

min:  1 at index:  0
max:  5 at index:  3
sqrt:  [1.         1.41421356 1.73205081 2.23606798]


In [260]:
# Min Max Function of numpy on row and column

arr = np.array([[1,2,3],[4,0,6]])
print(arr,"\n")
print("row min: ", np.min(arr,axis=1))
print("column min: ", np.min(arr,axis=0))

[[1 2 3]
 [4 0 6]] 

row min:  [1 0]
column min:  [1 0 3]


In [261]:
# Trigno Functions using numpy

print(np.sin(arr))
print()
print(np.cos(arr))

[[ 0.84147098  0.90929743  0.14112001]
 [-0.7568025   0.         -0.2794155 ]]

[[ 0.54030231 -0.41614684 -0.9899925 ]
 [-0.65364362  1.          0.96017029]]


# Cumsum Σ


In [262]:
# Cumsum Function using numpy basiclly adds two consuctive terms

print(np.cumsum(arr))

[ 1  3  6 10 10 16]


In [263]:
# Array Shape

arr = np.array([[1,2,3],[4,0,6]])
arr.shape

(2, 3)

# Reshape 🔄


In [264]:
# Array Reshaping 1D to 2D

var2 = np.array([1,2,3,4,5,6])
print(var2)
print (var2.ndim)
print()
x = var2.reshape (3,2)

print(x)
print(x.ndim)

[1 2 3 4 5 6]
1

[[1 2]
 [3 4]
 [5 6]]
2


In [265]:
# Array Reshaping 1D to 3D

var3 = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
print (var3)

print(var3.ndim)
print ()
x1 = var3.reshape (2,3,2)
print(x1)
print(x1.ndim)

[ 1  2  3  4  5  6  7  8  9 10 11 12]
1

[[[ 1  2]
  [ 3  4]
  [ 5  6]]

 [[ 7  8]
  [ 9 10]
  [11 12]]]
3


In [266]:
# Array Reshaping 3D to 1D

var3 = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
print (var3)

print(var3.ndim)
print ()

x1 = var3.reshape (2,3,2)
print(x1)
print(x1.ndim)

x1 = var3.reshape (-1)
print(x1)
print(x1.ndim)

[ 1  2  3  4  5  6  7  8  9 10 11 12]
1

[[[ 1  2]
  [ 3  4]
  [ 5  6]]

 [[ 7  8]
  [ 9 10]
  [11 12]]]
3
[ 1  2  3  4  5  6  7  8  9 10 11 12]
1


# Broadcasting 📡


In [267]:
# broadcasting

var1= np.array([1,2,3])
print(var1.shape)
print()
print (var1)
print()
var2 = np.array([[1], [2], [3]])
print(var2.shape)
print()
print(var2)

print()

print(var1 + var2)

(3,)

[1 2 3]

(3, 1)

[[1]
 [2]
 [3]]

[[2 3 4]
 [3 4 5]
 [4 5 6]]


# Indexing 🔢


In [268]:
# Array Indexing

var = np.array([9,8,7,6])
              # 0,1,2,3
            #-4,-3,-2,-1
print (var[1])
print (var[-3])

8
8


In [269]:
# Indexing in 2D

var2 =np.array([[[1,2], [6,7]]])
print(var2)
print (var2.ndim)
print ()
print (var2 [0,1,1])

[[[1 2]
  [6 7]]]
3

7


# Slicing 🍰


In [270]:
# Slicing

var = np.array([9,8,7,6,5,4,3])
              # 0,1,2,3,4,5,6
print (var)
print ()
print("8 to 5 : ", var [1:5])
print("8 to End : ", var [1:])
print("start to 5 : ", var[:5])
print("1 Skip", var [0:7:2])

[9 8 7 6 5 4 3]

8 to 5 :  [8 7 6 5]
8 to End :  [8 7 6 5 4 3]
start to 5 :  [9 8 7 6 5]
1 Skip [9 7 5 3]


In [271]:
# Slicing in 2-D

var1 = np.array([[1,2,3,4,5],[9,8,7,6,5], [11,12,13,14,15]])
#                     0          1                 2               {index}
print(var1)
print()
print("3 row: ",var1[2,:])

[[ 1  2  3  4  5]
 [ 9  8  7  6  5]
 [11 12 13 14 15]]

3 row:  [11 12 13 14 15]


# Iterations 🔁


In [272]:
# iteration moving in 3D 

var3 = np.array([[[9,8,7,6],[1,2,3,4]]])
print (var3)
print(var3.ndim)
print ()

for i in var3:
    for j in i:
        for k in j:
            print (k)   

[[[9 8 7 6]
  [1 2 3 4]]]
3

9
8
7
6
1
2
3
4


In [273]:
var3 = np.array([[[9,8,7,6],[1,2,3,4]]])
print(var3)
print(var3.ndim)
print()
for i in np.nditer (var3, flags=['buffered' ], op_dtypes=["S"]):
    print(i)

    #flags=['buffered' ], op_dtypes=["S"] can remove also

[[[9 8 7 6]
  [1 2 3 4]]]
3

b'9'
b'8'
b'7'
b'6'
b'1'
b'2'
b'3'
b'4'


# Ndenumerate 🔄🔢


In [274]:
# ndenumerate gives indes of each value

var3 = np.array([[[9,8,7,6],[1,2,3,4]]])
print(var3)
print (var3.ndim)
print()
for i,d in np.ndenumerate (var3):
    print(i, d)

[[[9 8 7 6]
  [1 2 3 4]]]
3

(0, 0, 0) 9
(0, 0, 1) 8
(0, 0, 2) 7
(0, 0, 3) 6
(0, 1, 0) 1
(0, 1, 1) 2
(0, 1, 2) 3
(0, 1, 3) 4


# Copy 📝


In [275]:
# Array Copy

var = np.array([1,2,3,4])
co = var.copy ()
var [1]=40
print("var: ",var)
print("copy",co)

var:  [ 1 40  3  4]
copy [1 2 3 4]


# View 👀


In [276]:
# View method refer to same array memory location

x= np.array([9,8,7,6,5])
vi = x.view()
x[1]=40
print("x : ",x)
print("view: ",vi)

x :  [ 9 40  7  6  5]
view:  [ 9 40  7  6  5]


# Concatenate ➕


In [277]:
# Joining Array concatenate function

var = np.array([1,2,3,4])
var1 = np.array([9,8,7,6])
ar = np.concatenate((var, var1))
print (ar)

[1 2 3 4 9 8 7 6]


In [278]:
# For 2D
# axis 0 is for column 
# axis 1 is for row

vr = np.array([[1,2],[3,4]])
vr1 = np.array([[9,8],[7,6]])
ar_new = np. concatenate((vr, vr1), axis=1)
print (vr)
print()
print (vr1)
print ()
print (ar_new)

[[1 2]
 [3 4]]

[[9 8]
 [7 6]]

[[1 2 9 8]
 [3 4 7 6]]


# Stack 📚


In [279]:
# Joining Array Stack function

var_1= np.array([1,2,3,4])
var_2 = np.array([9,8,7,6])
a_new = np.stack((var_1, var_2), axis = 0)
a_new1 = np.hstack((var_1, var_2)) #row axis 1
a_new2 = np.vstack((var_1, var_2)) # colums axis 0
a_new3 = np.dstack((var_1, var_2)) # height wise
print (a_new)
print()
print (a_new1)
print() 
print (a_new2)
print ()
print(a_new3)

[[1 2 3 4]
 [9 8 7 6]]

[1 2 3 4 9 8 7 6]

[[1 2 3 4]
 [9 8 7 6]]

[[[1 9]
  [2 8]
  [3 7]
  [4 6]]]


# Split 🍴


In [280]:
# Array split

var = np.array([1,2,3,4,5,6])
print (var)
ar = np.array_split(var, 3)
print()
print (ar)
print (ar[0])

[1 2 3 4 5 6]

[array([1, 2]), array([3, 4]), array([5, 6])]
[1 2]


In [281]:
# Array 2D split

var1 = np.array([[1,2], [3,4], [5,6]])
print (var)
ar1= np.array_split(var1,3)
ar2 = np.array_split(var1,3,axis=1)
print ()
print (ar1)
print ()
print (ar2)

[1 2 3 4 5 6]

[array([[1, 2]]), array([[3, 4]]), array([[5, 6]])]

[array([[1],
       [3],
       [5]]), array([[2],
       [4],
       [6]]), array([], shape=(3, 0), dtype=int64)]


# Where ❓


In [282]:
# where function condition

arr = np.array([1,2,3,4,5,6,2,4,6,8])
x= np.where((arr%2) == 0)
print(x)

(array([1, 3, 5, 6, 7, 8, 9]),)


In [283]:
arr

array([1, 2, 3, 4, 5, 6, 2, 4, 6, 8])

# Searchsorted 🔍


In [284]:
# Index on which new values can be stored along with side 

x1 = np.searchsorted(arr, [8,9], side = "right")
x1

array([10, 10])

In [285]:
# Sorting array

np.sort(arr)

array([1, 2, 2, 3, 4, 4, 5, 6, 6, 8])

In [286]:
# Catogorical Sort

arr = np.array(['a','s','d','i'])
np.sort(arr)

array(['a', 'd', 'i', 's'], dtype='<U1')

In [287]:
# Bool 

f = [True, False,True ,True]
new = arr[f]
new

array(['a', 'd', 'i'], dtype='<U1')

# Shuffle 🔀


In [288]:
# It shuffles randomly

var = np.array([1,2,3,4,5])
np.random.shuffle(var)
var

array([3, 2, 1, 5, 4])

# Unique 🆕


In [289]:
# Returns uniqe elements with their index position and no. of counts

var = np.array([1,2,3,2,2,6,4,5,2,8,2,1,4,5])
x = np.unique(var, return_counts=True, return_index =True)
x

(array([1, 2, 3, 4, 5, 6, 8]),
 array([0, 1, 2, 6, 7, 5, 9]),
 array([2, 5, 1, 2, 2, 1, 1]))

# Resize 📐


In [290]:
# resize Array 1D to 2D

var2= np.array([1,2,3,4,5,6])
y= np.resize(var2,(2,3))
y

array([[1, 2, 3],
       [4, 5, 6]])

# Flatten ⏹️


In [291]:
# flatten converting 2D to 1D

y.flatten(order = "f")

array([1, 4, 2, 5, 3, 6])

# Ravel 🔄


In [292]:
# ravel function 2D to 1D

np.ravel(y,order = "f")

array([1, 4, 2, 5, 3, 6])

# Insert ➕


In [293]:
# inserting in array but, it only takes int values

var = np.array([1,2,3,4])
v = np.insert(var, (2,4), 6.4)

v

array([1, 2, 6, 3, 4, 6])

# Append ➕


In [294]:
# append accepts all values

x= np.append(var, 80.2)
x

array([ 1. ,  2. ,  3. ,  4. , 80.2])

In [295]:
# 2D append

var = np.array([[1,2,3],[1,2,3]])
v = np.append(var, [[45,25,64]],axis=0)
v

array([[ 1,  2,  3],
       [ 1,  2,  3],
       [45, 25, 64]])

# Delete ❌


In [296]:
# deleting from index

var = np.array([1,2,3,4])
x = np.delete(var,2)
x

array([1, 2, 4])

# Matrix ⚙️


In [297]:
# creating matrix 

var = np.matrix([[1,2],[1,2]])
var1 = np.matrix([[1,2],[1,2]])
print(var)
print(type(var))

[[1 2]
 [1 2]]
<class 'numpy.matrix'>


# Dot Product ⬜⬛


In [298]:
# dot product

var.dot(var1)

matrix([[3, 6],
        [3, 6]])

# Transpose 🔄


In [299]:
# Transpose

np.transpose(var)

matrix([[1, 1],
        [2, 2]])

In [300]:
# Second way to transpose a matrix

var.T

matrix([[1, 1],
        [2, 2]])

# Swapaxes ↔️


In [301]:
# it swap matrix on axis, basiclly transpose

np.swapaxes(var,0,1)

matrix([[1, 1],
        [2, 2]])

# Inverse 🔄⁻¹


In [302]:
# inverse of matrix

var = np.array([[25,58],[54,85]])

np.linalg.inv(var)

array([[-0.08440914,  0.05759682],
       [ 0.05362463, -0.02482622]])

# Power ⏹️²


In [303]:
# power of matrix

var = np.array([[2,5],[4,3]])
print(np.power(var,2))
np.linalg.matrix_power(var, 2)

[[ 4 25]
 [16  9]]


array([[24, 25],
       [20, 29]])

# Determinant 📐


In [304]:
# Determinant of matrix

np.linalg.det(var)

-14.000000000000004

![image.png](attachment:f9b275d8-ae5a-433d-b7b3-c44d7f9dbcfc.png)

# Pandas 🐼


In [305]:
import pandas as pd


# Series 🔤


In [307]:
# Creating a Series 

x=[1,2,3,4,5]
n = pd.Series(x)
print(n[3])
n
print(type(n))

4
<class 'pandas.core.series.Series'>


In [308]:
# custom index, data type, name 

v = pd.Series(x, index=["a","v","s","d","r"],dtype="float",name="custom index")
v

a    1.0
v    2.0
s    3.0
d    4.0
r    5.0
Name: custom index, dtype: float64

In [309]:
# creating series with dictionary

dict = {"name": ["python","c++"], 
       "rank": [1,2]}

c= pd.Series(dict)
c

name    [python, c++]
rank           [1, 2]
dtype: object

In [310]:
# series as per desired index

s= pd.Series(13, index=[1,2,3,4,5])
s

1    13
2    13
3    13
4    13
5    13
dtype: int64

In [311]:
# adding two series

s= pd.Series(13, index=[1,2,3,4,5])
ss= pd.Series(13, index=[1,2,3])
s+ss

1    26.0
2    26.0
3    26.0
4     NaN
5     NaN
dtype: float64

- adding here do not a gives broadcasting error as numpy array gives
- only gives 1 d data no multidimensional data in series
- lets look at dataframe

# DataFrame 📊📋


In [313]:
# Creating Dataframe

l = [1,2,3,4,5]
f = pd.DataFrame(l)

print(type(f))
f

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5


In [314]:
# Dataframe using dictionary

dict = {"name": ["python","c++"], 
       "rank": [1,2]}
v = pd.DataFrame(dict, columns=["name"],index = ["p","d"])
print(v)

# Access element through index
v["name"]["d"]

     name
p  python
d     c++


'c++'

In [315]:
# 2D dataframe

l = [[1,2,3,4],[1,2,3,4]]
l = pd.DataFrame(l)
l

Unnamed: 0,0,1,2,3
0,1,2,3,4
1,1,2,3,4


In [316]:

dict = {"name": pd.Series(["python","c++"]), 
       "rank": pd.Series([1,2])}
dict = pd.DataFrame(dict)
dict


Unnamed: 0,name,rank
0,python,1
1,c++,2


# Arithmetic Operations ➕➖✖️➗


In [586]:
# Adding two features values

d= pd.DataFrame({"a": [1,2,3,4,5,6],
                "b": [1,2,3,4,5,6]})
# Add

d["add"] = d["a"] + d["b"]
# Subtract

d["sub"] = d["a"] - d["b"]
# Multiply

d["mul"] = d["a"] * d["b"]
# Division

d["div"] = d["a"] / d["b"]
# Custom conditions

d["costum"] = d["a"] >2
d

Unnamed: 0,a,b,add,sub,mul,div,costum
0,1,1,2,0,1,1.0,False
1,2,2,4,0,4,1.0,False
2,3,3,6,0,9,1.0,True
3,4,4,8,0,16,1.0,True
4,5,5,10,0,25,1.0,True
5,6,6,12,0,36,1.0,True


# Insert ➕


In [322]:
# Inserting Feature and custom conditions 

vr = pd.DataFrame({"a": [1,2,3,4,5],"b": [1,2,3,4,5]})
vr.insert(1,"python", [1,2,3,4,5])
vr["condition"] = vr["a"][:4]
vr

Unnamed: 0,a,python,b,condition
0,1,1,1,1.0
1,2,2,2,2.0
2,3,3,3,3.0
3,4,4,4,4.0
4,5,5,5,


In [323]:
vr = pd.DataFrame({"a": [1,2,3,4,5],"b": [1,2,3,4,5],"c": [1,2,3,4,5]})


# POP 🍿

In [324]:
# POP removes out

var = vr.pop("b")
var

0    1
1    2
2    3
3    4
4    5
Name: b, dtype: int64

In [325]:
vr

Unnamed: 0,a,c
0,1,1
1,2,2
2,3,3
3,4,4
4,5,5


# Delete ❌


In [326]:
# Deleting feature

del vr["c"]

In [327]:
# var has stored the feature b added back to the vr

vr["b"] = var
vr

Unnamed: 0,a,b
0,1,1
1,2,2
2,3,3
3,4,4
4,5,5


# DataFrame to CSV 💾


In [328]:
# DataFrame to csv

vr = pd.DataFrame({"a": [1,2,3,4,5],"b": [1,2,3,4,5],"c": [1,2,3,4,5]})

vr
vr.to_csv("data.csv", index=False,header=["co1","co2","col3"])

# Dataset📁: Using Numpy ➡️🔢


In [329]:
# creating a dataset as we require to work with csv pandas functions using numpy

In [330]:
import numpy as np

# Generate random names
names = np.array(['Arjun', 'Aarav', 'Akshay', 'Bhavya', 'Chahat', 'Deepika', 'Esha', 'Farhan', 'Gauri', 'Harshita'])

# Generate random grades
grades = np.random.choice(['A', 'B', 'C'], size=10)

# Generate total marks based on grades
total_marks = np.where(grades == 'A', np.random.randint(91, 101, size=10),
                       np.where(grades == 'B', np.random.randint(81, 91, size=10),
                                np.random.randint(71, 81, size=10)))

# Generate random states
states = np.array(['California', 'Texas', 'New York', 'Florida', 'Illinois', 'Pennsylvania', 'Ohio', 'Georgia', 'North Carolina', 'Michigan'])

# Combine the data into a 2D numpy array
data = np.column_stack((names, grades, total_marks, states))

# Save the data to a CSV file
file_name = "data1.csv"
header = "Name,Grade,Total_Marks,State"
np.savetxt(file_name, data, delimiter=',', header=header, fmt='%s', comments='')

print(f"CSV file '{file_name}' has been generated.")


CSV file 'data1.csv' has been generated.


# Read_CSV 📖


In [331]:
# importing csv file as dataframe

df=pd.read_csv("data1.csv")

In [332]:
df

Unnamed: 0,Name,Grade,Total_Marks,State
0,Arjun,B,82,California
1,Aarav,C,75,Texas
2,Akshay,C,78,New York
3,Bhavya,A,93,Florida
4,Chahat,C,72,Illinois
5,Deepika,A,92,Pennsylvania
6,Esha,C,77,Ohio
7,Farhan,B,87,Georgia
8,Gauri,A,97,North Carolina
9,Harshita,C,78,Michigan


In [333]:
# importing with required no. of rows

df=pd.read_csv("data1.csv",nrows=4)
df

Unnamed: 0,Name,Grade,Total_Marks,State
0,Arjun,B,82,California
1,Aarav,C,75,Texas
2,Akshay,C,78,New York
3,Bhavya,A,93,Florida


In [334]:
# Importing with desired columns

df=pd.read_csv("data1.csv",usecols=["Name","State"])
df

Unnamed: 0,Name,State
0,Arjun,California
1,Aarav,Texas
2,Akshay,New York
3,Bhavya,Florida
4,Chahat,Illinois
5,Deepika,Pennsylvania
6,Esha,Ohio
7,Farhan,Georgia
8,Gauri,North Carolina
9,Harshita,Michigan


In [335]:
# Skip some rows

df=pd.read_csv("data1.csv",skiprows=[1,2,3])
df

Unnamed: 0,Name,Grade,Total_Marks,State
0,Bhavya,A,93,Florida
1,Chahat,C,72,Illinois
2,Deepika,A,92,Pennsylvania
3,Esha,C,77,Ohio
4,Farhan,B,87,Georgia
5,Gauri,A,97,North Carolina
6,Harshita,C,78,Michigan


In [336]:
# set column as index

df=pd.read_csv("data1.csv",index_col=["Name"])
df

Unnamed: 0_level_0,Grade,Total_Marks,State
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Arjun,B,82,California
Aarav,C,75,Texas
Akshay,C,78,New York
Bhavya,A,93,Florida
Chahat,C,72,Illinois
Deepika,A,92,Pennsylvania
Esha,C,77,Ohio
Farhan,B,87,Georgia
Gauri,A,97,North Carolina
Harshita,C,78,Michigan


In [337]:
# setting any row as header

df=pd.read_csv("data1.csv",header=2)
df

Unnamed: 0,Aarav,C,75,Texas
0,Akshay,C,78,New York
1,Bhavya,A,93,Florida
2,Chahat,C,72,Illinois
3,Deepika,A,92,Pennsylvania
4,Esha,C,77,Ohio
5,Farhan,B,87,Georgia
6,Gauri,A,97,North Carolina
7,Harshita,C,78,Michigan


In [338]:
# replace columns names

df=pd.read_csv("data1.csv",names=["col1","col2","col3","col4"])
df

Unnamed: 0,col1,col2,col3,col4
0,Name,Grade,Total_Marks,State
1,Arjun,B,82,California
2,Aarav,C,75,Texas
3,Akshay,C,78,New York
4,Bhavya,A,93,Florida
5,Chahat,C,72,Illinois
6,Deepika,A,92,Pennsylvania
7,Esha,C,77,Ohio
8,Farhan,B,87,Georgia
9,Gauri,A,97,North Carolina


In [339]:
df=pd.read_csv("data1.csv",header=None,prefix="col")
df



  df=pd.read_csv("data1.csv",header=None,prefix="col")


Unnamed: 0,col0,col1,col2,col3
0,Name,Grade,Total_Marks,State
1,Arjun,B,82,California
2,Aarav,C,75,Texas
3,Akshay,C,78,New York
4,Bhavya,A,93,Florida
5,Chahat,C,72,Illinois
6,Deepika,A,92,Pennsylvania
7,Esha,C,77,Ohio
8,Farhan,B,87,Georgia
9,Gauri,A,97,North Carolina


In [340]:
# Changing datatype

df=pd.read_csv("data1.csv",dtype={"Total_Marks":"float"})
df

Unnamed: 0,Name,Grade,Total_Marks,State
0,Arjun,B,82.0,California
1,Aarav,C,75.0,Texas
2,Akshay,C,78.0,New York
3,Bhavya,A,93.0,Florida
4,Chahat,C,72.0,Illinois
5,Deepika,A,92.0,Pennsylvania
6,Esha,C,77.0,Ohio
7,Farhan,B,87.0,Georgia
8,Gauri,A,97.0,North Carolina
9,Harshita,C,78.0,Michigan


In [341]:
df=pd.read_csv("data1.csv")
df

Unnamed: 0,Name,Grade,Total_Marks,State
0,Arjun,B,82,California
1,Aarav,C,75,Texas
2,Akshay,C,78,New York
3,Bhavya,A,93,Florida
4,Chahat,C,72,Illinois
5,Deepika,A,92,Pennsylvania
6,Esha,C,77,Ohio
7,Farhan,B,87,Georgia
8,Gauri,A,97,North Carolina
9,Harshita,C,78,Michigan


# Index 🔢


In [342]:
# give index info
df.index

RangeIndex(start=0, stop=10, step=1)

# Columns 📇


In [343]:
# gives columns info
df.columns

Index(['Name', 'Grade', 'Total_Marks', 'State'], dtype='object')

# Info 📄


In [344]:
# gives data info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Name         10 non-null     object
 1   Grade        10 non-null     object
 2   Total_Marks  10 non-null     int64 
 3   State        10 non-null     object
dtypes: int64(1), object(3)
memory usage: 448.0+ bytes


# Shape 🔺


In [345]:
# shape of dataframe
df.shape

(10, 4)

# Describe 📋


In [346]:
# stats of numerical columns only
df.describe()

Unnamed: 0,Total_Marks
count,10.0
mean,83.1
std,8.595218
min,72.0
25%,77.25
50%,80.0
75%,90.75
max,97.0


# Head 🚀


In [347]:
# 5 rows from starting
df.head()

Unnamed: 0,Name,Grade,Total_Marks,State
0,Arjun,B,82,California
1,Aarav,C,75,Texas
2,Akshay,C,78,New York
3,Bhavya,A,93,Florida
4,Chahat,C,72,Illinois


# Sample 🎲


In [348]:
# random 5 samples from data
df.sample(5)

Unnamed: 0,Name,Grade,Total_Marks,State
8,Gauri,A,97,North Carolina
4,Chahat,C,72,Illinois
5,Deepika,A,92,Pennsylvania
2,Akshay,C,78,New York
9,Harshita,C,78,Michigan


# Tail 🐾


In [349]:
# 5 rows from last
df.tail()

Unnamed: 0,Name,Grade,Total_Marks,State
5,Deepika,A,92,Pennsylvania
6,Esha,C,77,Ohio
7,Farhan,B,87,Georgia
8,Gauri,A,97,North Carolina
9,Harshita,C,78,Michigan


# To Numpy ➡️🔢


In [350]:
# Converting df to numpy array

df.to_numpy()

array([['Arjun', 'B', 82, 'California'],
       ['Aarav', 'C', 75, 'Texas'],
       ['Akshay', 'C', 78, 'New York'],
       ['Bhavya', 'A', 93, 'Florida'],
       ['Chahat', 'C', 72, 'Illinois'],
       ['Deepika', 'A', 92, 'Pennsylvania'],
       ['Esha', 'C', 77, 'Ohio'],
       ['Farhan', 'B', 87, 'Georgia'],
       ['Gauri', 'A', 97, 'North Carolina'],
       ['Harshita', 'C', 78, 'Michigan']], dtype=object)

# Sort Index 🔢🔍


In [351]:
# sorting

df.sort_index(axis=0,ascending=False)

Unnamed: 0,Name,Grade,Total_Marks,State
9,Harshita,C,78,Michigan
8,Gauri,A,97,North Carolina
7,Farhan,B,87,Georgia
6,Esha,C,77,Ohio
5,Deepika,A,92,Pennsylvania
4,Chahat,C,72,Illinois
3,Bhavya,A,93,Florida
2,Akshay,C,78,New York
1,Aarav,C,75,Texas
0,Arjun,B,82,California


# Loc 🔍🔢


In [352]:
# changing values of feature

df.loc[0,"Name"]= "Aditya"
df

Unnamed: 0,Name,Grade,Total_Marks,State
0,Aditya,B,82,California
1,Aarav,C,75,Texas
2,Akshay,C,78,New York
3,Bhavya,A,93,Florida
4,Chahat,C,72,Illinois
5,Deepika,A,92,Pennsylvania
6,Esha,C,77,Ohio
7,Farhan,B,87,Georgia
8,Gauri,A,97,North Carolina
9,Harshita,C,78,Michigan


In [353]:
# loc function 

df.loc[[0,1,2],["Name","Grade"]]

Unnamed: 0,Name,Grade
0,Aditya,B
1,Aarav,C
2,Akshay,C


# ILoc 🔍🔢


In [354]:
# iloc gives exact value

df.iloc[0,3]

'California'

# Drop ❌


In [355]:
# drop features

df.drop("State",axis=1)

Unnamed: 0,Name,Grade,Total_Marks
0,Aditya,B,82
1,Aarav,C,75
2,Akshay,C,78
3,Bhavya,A,93
4,Chahat,C,72
5,Deepika,A,92
6,Esha,C,77
7,Farhan,B,87
8,Gauri,A,97
9,Harshita,C,78


# Add NaN Values 🆕🚫


In [356]:
# replacing some values with nan for further analysis


df=pd.read_csv("data1.csv")

# Replace 'B' in 'Grade' column with NaN
df['Grade'].replace('B', np.nan, inplace=True)

# Replace total marks less than 80 with NaN in 'total_marks' column
df.loc[df['Total_Marks'] < 80, 'Total_Marks'] = np.nan


In [357]:
df

Unnamed: 0,Name,Grade,Total_Marks,State
0,Arjun,,82.0,California
1,Aarav,C,,Texas
2,Akshay,C,,New York
3,Bhavya,A,93.0,Florida
4,Chahat,C,,Illinois
5,Deepika,A,92.0,Pennsylvania
6,Esha,C,,Ohio
7,Farhan,,87.0,Georgia
8,Gauri,A,97.0,North Carolina
9,Harshita,C,,Michigan


# Dropna 🚫📤


In [358]:
# drop nan values column wise

df.dropna(axis=1)

Unnamed: 0,Name,State
0,Arjun,California
1,Aarav,Texas
2,Akshay,New York
3,Bhavya,Florida
4,Chahat,Illinois
5,Deepika,Pennsylvania
6,Esha,Ohio
7,Farhan,Georgia
8,Gauri,North Carolina
9,Harshita,Michigan


In [359]:
# drop nan values row wise


df.dropna(axis=0)

Unnamed: 0,Name,Grade,Total_Marks,State
3,Bhavya,A,93.0,Florida
5,Deepika,A,92.0,Pennsylvania
8,Gauri,A,97.0,North Carolina


In [360]:
# droping feature wise

df.dropna(subset=["Grade"])

Unnamed: 0,Name,Grade,Total_Marks,State
1,Aarav,C,,Texas
2,Akshay,C,,New York
3,Bhavya,A,93.0,Florida
4,Chahat,C,,Illinois
5,Deepika,A,92.0,Pennsylvania
6,Esha,C,,Ohio
8,Gauri,A,97.0,North Carolina
9,Harshita,C,,Michigan


In [361]:
# making permanat drop

df.dropna(inplace=True)

In [362]:
df

Unnamed: 0,Name,Grade,Total_Marks,State
3,Bhavya,A,93.0,Florida
5,Deepika,A,92.0,Pennsylvania
8,Gauri,A,97.0,North Carolina


In [363]:
# re importing fresh data


df=pd.read_csv("data1.csv")

# Replace 'B' in 'Grade' column with NaN
df['Grade'].replace('B', np.nan, inplace=True)

# Replace total marks less than 80 with NaN in 'total_marks' column
df.loc[df['Total_Marks'] < 80, 'Total_Marks'] = np.nan
df

Unnamed: 0,Name,Grade,Total_Marks,State
0,Arjun,,82.0,California
1,Aarav,C,,Texas
2,Akshay,C,,New York
3,Bhavya,A,93.0,Florida
4,Chahat,C,,Illinois
5,Deepika,A,92.0,Pennsylvania
6,Esha,C,,Ohio
7,Farhan,,87.0,Georgia
8,Gauri,A,97.0,North Carolina
9,Harshita,C,,Michigan


# Fillna 🆕📥


In [364]:
# fill nan values

df.fillna("Missing")

Unnamed: 0,Name,Grade,Total_Marks,State
0,Arjun,Missing,82.0,California
1,Aarav,C,Missing,Texas
2,Akshay,C,Missing,New York
3,Bhavya,A,93.0,Florida
4,Chahat,C,Missing,Illinois
5,Deepika,A,92.0,Pennsylvania
6,Esha,C,Missing,Ohio
7,Farhan,Missing,87.0,Georgia
8,Gauri,A,97.0,North Carolina
9,Harshita,C,Missing,Michigan


In [365]:
# it fills with previous values column wise by default

df.fillna(method="ffill")    

Unnamed: 0,Name,Grade,Total_Marks,State
0,Arjun,,82.0,California
1,Aarav,C,82.0,Texas
2,Akshay,C,82.0,New York
3,Bhavya,A,93.0,Florida
4,Chahat,C,93.0,Illinois
5,Deepika,A,92.0,Pennsylvania
6,Esha,C,92.0,Ohio
7,Farhan,C,87.0,Georgia
8,Gauri,A,97.0,North Carolina
9,Harshita,C,97.0,Michigan


In [366]:
# it fills with previous values row wise 


df.fillna(method="bfill",axis=1)    

Unnamed: 0,Name,Grade,Total_Marks,State
0,Arjun,82.0,82.0,California
1,Aarav,C,Texas,Texas
2,Akshay,C,New York,New York
3,Bhavya,A,93.0,Florida
4,Chahat,C,Illinois,Illinois
5,Deepika,A,92.0,Pennsylvania
6,Esha,C,Ohio,Ohio
7,Farhan,87.0,87.0,Georgia
8,Gauri,A,97.0,North Carolina
9,Harshita,C,Michigan,Michigan


In [367]:
# limits the values to be filled column wise

df.fillna("miss",limit=2)

Unnamed: 0,Name,Grade,Total_Marks,State
0,Arjun,miss,82.0,California
1,Aarav,C,miss,Texas
2,Akshay,C,miss,New York
3,Bhavya,A,93.0,Florida
4,Chahat,C,,Illinois
5,Deepika,A,92.0,Pennsylvania
6,Esha,C,,Ohio
7,Farhan,miss,87.0,Georgia
8,Gauri,A,97.0,North Carolina
9,Harshita,C,,Michigan


In [368]:
# Permenant change

df.fillna("missing",inplace=True)
df

Unnamed: 0,Name,Grade,Total_Marks,State
0,Arjun,missing,82.0,California
1,Aarav,C,missing,Texas
2,Akshay,C,missing,New York
3,Bhavya,A,93.0,Florida
4,Chahat,C,missing,Illinois
5,Deepika,A,92.0,Pennsylvania
6,Esha,C,missing,Ohio
7,Farhan,missing,87.0,Georgia
8,Gauri,A,97.0,North Carolina
9,Harshita,C,missing,Michigan


In [369]:
df=pd.read_csv("data1.csv")

df

Unnamed: 0,Name,Grade,Total_Marks,State
0,Arjun,B,82,California
1,Aarav,C,75,Texas
2,Akshay,C,78,New York
3,Bhavya,A,93,Florida
4,Chahat,C,72,Illinois
5,Deepika,A,92,Pennsylvania
6,Esha,C,77,Ohio
7,Farhan,B,87,Georgia
8,Gauri,A,97,North Carolina
9,Harshita,C,78,Michigan


# Replace 🔄


In [370]:
# replacing values

df.replace(to_replace="Arjun",value="Adi")

Unnamed: 0,Name,Grade,Total_Marks,State
0,Adi,B,82,California
1,Aarav,C,75,Texas
2,Akshay,C,78,New York
3,Bhavya,A,93,Florida
4,Chahat,C,72,Illinois
5,Deepika,A,92,Pennsylvania
6,Esha,C,77,Ohio
7,Farhan,B,87,Georgia
8,Gauri,A,97,North Carolina
9,Harshita,C,78,Michigan


In [371]:
# every alphabet is replaced by ram

df.replace("[A-Za-z]","Ram",regex=True)

Unnamed: 0,Name,Grade,Total_Marks,State
0,RamRamRamRamRam,Ram,82,RamRamRamRamRamRamRamRamRamRam
1,RamRamRamRamRam,Ram,75,RamRamRamRamRam
2,RamRamRamRamRamRam,Ram,78,RamRamRam RamRamRamRam
3,RamRamRamRamRamRam,Ram,93,RamRamRamRamRamRamRam
4,RamRamRamRamRamRam,Ram,72,RamRamRamRamRamRamRamRam
5,RamRamRamRamRamRamRam,Ram,92,RamRamRamRamRamRamRamRamRamRamRamRam
6,RamRamRamRam,Ram,77,RamRamRamRam
7,RamRamRamRamRamRam,Ram,87,RamRamRamRamRamRamRam
8,RamRamRamRamRam,Ram,97,RamRamRamRamRam RamRamRamRamRamRamRamRam
9,RamRamRamRamRamRamRamRam,Ram,78,RamRamRamRamRamRamRamRam


In [372]:
df

Unnamed: 0,Name,Grade,Total_Marks,State
0,Arjun,B,82,California
1,Aarav,C,75,Texas
2,Akshay,C,78,New York
3,Bhavya,A,93,Florida
4,Chahat,C,72,Illinois
5,Deepika,A,92,Pennsylvania
6,Esha,C,77,Ohio
7,Farhan,B,87,Georgia
8,Gauri,A,97,North Carolina
9,Harshita,C,78,Michigan


In [373]:
# replacing in particular features

df.replace({"Grade":"[A-Z]"},"Fail",regex=True)

Unnamed: 0,Name,Grade,Total_Marks,State
0,Arjun,Fail,82,California
1,Aarav,Fail,75,Texas
2,Akshay,Fail,78,New York
3,Bhavya,Fail,93,Florida
4,Chahat,Fail,72,Illinois
5,Deepika,Fail,92,Pennsylvania
6,Esha,Fail,77,Ohio
7,Farhan,Fail,87,Georgia
8,Gauri,Fail,97,North Carolina
9,Harshita,Fail,78,Michigan


In [374]:
# use inplace to make permanat replace as above

In [375]:

df=pd.read_csv("data1.csv")

# Replace 'B' in 'Grade' column with NaN
df['Grade'].replace('B', np.nan, inplace=True)

# Replace total marks less than 80 with NaN in 'total_marks' column
df.loc[df['Total_Marks'] < 80, 'Total_Marks'] = np.nan
df

Unnamed: 0,Name,Grade,Total_Marks,State
0,Arjun,,82.0,California
1,Aarav,C,,Texas
2,Akshay,C,,New York
3,Bhavya,A,93.0,Florida
4,Chahat,C,,Illinois
5,Deepika,A,92.0,Pennsylvania
6,Esha,C,,Ohio
7,Farhan,,87.0,Georgia
8,Gauri,A,97.0,North Carolina
9,Harshita,C,,Michigan


# Interpolate 📏🔄


In [376]:
df.interpolate()

Unnamed: 0,Name,Grade,Total_Marks,State
0,Arjun,,82.0,California
1,Aarav,C,85.666667,Texas
2,Akshay,C,89.333333,New York
3,Bhavya,A,93.0,Florida
4,Chahat,C,92.5,Illinois
5,Deepika,A,92.0,Pennsylvania
6,Esha,C,89.5,Ohio
7,Farhan,,87.0,Georgia
8,Gauri,A,97.0,North Carolina
9,Harshita,C,97.0,Michigan


method: This parameter specifies the interpolation method to use. Some commonly used methods are:

'linear': Performs linear interpolation (default).

'nearest': Fills missing values with the nearest non-missing value.

'zero': Fills missing values with zeros.
'slinear': Uses a piecewise linear interpolation.
'quadratic': Uses a quadratic interpolation.
'cubic': Uses a cubic interpolation.
axis: This parameter specifies the axis along which the interpolation is performed. Use axis=0 for columns and axis=1 for rows.

limit: This parameter sets the maximum number of consecutive NaN values to be filled. For example, setting limit=1 will only fill one NaN value in a sequence.

limit_direction: This parameter specifies the direction in which the filling occurs. It can take values 'forward', 'backward', or 'both'. 'forward' fills from the first non-NaN element, 'backward' fills from the last non-NaN element, and 'both' fills in both directions.

# Merge 🔗


In [377]:
# Mergeing dataframes


var = pd.DataFrame({"a":[1,2,3,4],"b":[1,2,3,4]})
var1 = pd.DataFrame({"a":[1,2,3,5],"c":[4,5,6,7]})

pd.merge(var,var1,on="a")


Unnamed: 0,a,b,c
0,1,1,4
1,2,2,5
2,3,3,6


In [378]:
pd.merge(var,var1,how="left")

Unnamed: 0,a,b,c
0,1,1,4.0
1,2,2,5.0
2,3,3,6.0
3,4,4,


In [379]:
pd.merge(var,var1,how="right")

Unnamed: 0,a,b,c
0,1,1.0,4
1,2,2.0,5
2,3,3.0,6
3,5,,7


In [380]:
pd.merge(var,var1,how="outer", indicator=True)

Unnamed: 0,a,b,c,_merge
0,1,1.0,4.0,both
1,2,2.0,5.0,both
2,3,3.0,6.0,both
3,4,4.0,,left_only
4,5,,7.0,right_only


In [381]:
var = pd.DataFrame({"a":[1,2,3,4],"b":[1,2,3,4]})
var1 = pd.DataFrame({"a":[1,2,3,5],"b":[4,5,6,7]})

pd.merge(var,var1,left_index=True,right_index=True, suffixes=("rank","id"))

Unnamed: 0,arank,brank,aid,bid
0,1,1,1,4
1,2,2,2,5
2,3,3,3,6
3,4,4,5,7


# Concat ➕


In [382]:
# concating dataframe,

pd.concat([var,var1])

Unnamed: 0,a,b
0,1,1
1,2,2
2,3,3
3,4,4
0,1,4
1,2,5
2,3,6
3,5,7


In [383]:
pd.concat([var,var1],axis=1)

Unnamed: 0,a,b,a.1,b.1
0,1,1,1,4
1,2,2,2,5
2,3,3,3,6
3,4,4,5,7


In [384]:
# concat on keys

pd.concat([var,var1],axis=0,keys=["d1","d2"])

Unnamed: 0,Unnamed: 1,a,b
d1,0,1,1
d1,1,2,2
d1,2,3,3
d1,3,4,4
d2,0,1,4
d2,1,2,5
d2,2,3,6
d2,3,5,7


In [385]:
var = pd.DataFrame({"name": ["a","b","c","d","b","c","a","b","d"],
                   "hindi": [89,56,85,56,78,98,65,15,23],
                   "english": [98,45,87,65,94,56,87,25,35]})
var

Unnamed: 0,name,hindi,english
0,a,89,98
1,b,56,45
2,c,85,87
3,d,56,65
4,b,78,94
5,c,98,56
6,a,65,87
7,b,15,25
8,d,23,35


# Grouping 🔢🧮


In [386]:
# grouping dataframe

var1 = var.groupby("name")
var1

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x786adae07100>

In [387]:
# printing that stuff!

for x,y in var1:
    print(x)
    print(y)
    print("----------------------")

a
  name  hindi  english
0    a     89       98
6    a     65       87
----------------------
b
  name  hindi  english
1    b     56       45
4    b     78       94
7    b     15       25
----------------------
c
  name  hindi  english
2    c     85       87
5    c     98       56
----------------------
d
  name  hindi  english
3    d     56       65
8    d     23       35
----------------------


In [388]:
# extract by particular name

var1.get_group("b")

Unnamed: 0,name,hindi,english
1,b,56,45
4,b,78,94
7,b,15,25


In [389]:
var1.min()

Unnamed: 0_level_0,hindi,english
name,Unnamed: 1_level_1,Unnamed: 2_level_1
a,65,87
b,15,25
c,85,56
d,23,35


In [390]:
var1.max()

Unnamed: 0_level_0,hindi,english
name,Unnamed: 1_level_1,Unnamed: 2_level_1
a,89,98
b,78,94
c,98,87
d,56,65


In [391]:
var1.mean()

Unnamed: 0_level_0,hindi,english
name,Unnamed: 1_level_1,Unnamed: 2_level_1
a,77.0,92.5
b,49.666667,54.666667
c,91.5,71.5
d,39.5,50.0


In [392]:
# converting it to list

lst = list(var1)
lst


[('a',
    name  hindi  english
  0    a     89       98
  6    a     65       87),
 ('b',
    name  hindi  english
  1    b     56       45
  4    b     78       94
  7    b     15       25),
 ('c',
    name  hindi  english
  2    c     85       87
  5    c     98       56),
 ('d',
    name  hindi  english
  3    d     56       65
  8    d     23       35)]

# Join 🔗


In [393]:
# joining dataframes

var = pd.DataFrame({"a":[1,2,3,4],"b":[1,2,3,4]},index=["a","b","c","d"])
var1 = pd.DataFrame({"d":[1,5],"c":[4,7]},index=["a","b"])

var.join(var1)

Unnamed: 0,a,b,d,c
a,1,1,1.0,4.0
b,2,2,5.0,7.0
c,3,3,,
d,4,4,,


In [394]:
# see joining diferentlly dive differ results

var1.join(var)

Unnamed: 0,d,c,a,b
a,1,4,1,1
b,5,7,2,2


In [395]:
var1.join(var,how="left")

Unnamed: 0,d,c,a,b
a,1,4,1,1
b,5,7,2,2


In [396]:
var1.join(var,how="right")

Unnamed: 0,d,c,a,b
a,1.0,4.0,1,1
b,5.0,7.0,2,2
c,,,3,3
d,,,4,4


In [397]:
var1.join(var,how="outer")

Unnamed: 0,d,c,a,b
a,1.0,4.0,1,1
b,5.0,7.0,2,2
c,,,3,3
d,,,4,4


In [398]:
var1.join(var,how="inner")

Unnamed: 0,d,c,a,b
a,1,4,1,1
b,5,7,2,2


In [399]:
var = pd.DataFrame({"a":[1,2,3,4],"b":[1,2,3,4]},index=["a","b","c","d"])
var1 = pd.DataFrame({"a":[1,5],"c":[4,7]},index=["a","b"])

var1.join(var,how="outer", lsuffix="_left",rsuffix="_right")

Unnamed: 0,a_left,c,a_right,b
a,1.0,4.0,1,1
b,5.0,7.0,2,2
c,,,3,3
d,,,4,4


# Append ➕


In [400]:
# Append 

var1.append(var,ignore_index=True)

  var1.append(var,ignore_index=True)


Unnamed: 0,a,c,b
0,1,4.0,
1,5,7.0,
2,1,,1.0
3,2,,2.0
4,3,,3.0
5,4,,4.0


In [401]:
var = pd.DataFrame({"days": [1,2,3,4,5],"eng": [65,98,78,55,64],"maths":[65,78,54,98,28]})
var


Unnamed: 0,days,eng,maths
0,1,65,65
1,2,98,78
2,3,78,54
3,4,55,98
4,5,64,28


# Melt 🔥


In [402]:
# Melt 

pd.melt(var)

Unnamed: 0,variable,value
0,days,1
1,days,2
2,days,3
3,days,4
4,days,5
5,eng,65
6,eng,98
7,eng,78
8,eng,55
9,eng,64


In [403]:
# days,eng,maths are formed prefectly using melt

pd.melt(var,id_vars="days",var_name="Subject",value_name="marks")

Unnamed: 0,days,Subject,marks
0,1,eng,65
1,2,eng,98
2,3,eng,78
3,4,eng,55
4,5,eng,64
5,1,maths,65
6,2,maths,78
7,3,maths,54
8,4,maths,98
9,5,maths,28


In [404]:
var = pd.DataFrame({"days": [1,2,3,4,5],"name":["a","b","a","b","a"],"eng": [65,98,78,55,64],"maths":[65,78,54,98,28]})
var

Unnamed: 0,days,name,eng,maths
0,1,a,65,65
1,2,b,98,78
2,3,a,78,54
3,4,b,55,98
4,5,a,64,28


# Pivot Table 📊👛


In [405]:
# pivot table 

var.pivot_table(index="days",columns="name")

Unnamed: 0_level_0,eng,eng,maths,maths
name,a,b,a,b
days,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
1,65.0,,65.0,
2,,98.0,,78.0
3,78.0,,54.0,
4,,55.0,,98.0
5,64.0,,28.0,


In [406]:
var.pivot_table(index="days",columns="name",values="eng")

name,a,b
days,Unnamed: 1_level_1,Unnamed: 2_level_1
1,65.0,
2,,98.0
3,78.0,
4,,55.0
5,64.0,


In [407]:
var = pd.DataFrame({"days": [1,1,1,2,2],"name":["a","b","a","b","a"],"eng": [65,98,78,55,64],"maths":[65,78,54,98,28]})
var

Unnamed: 0,days,name,eng,maths
0,1,a,65,65
1,1,b,98,78
2,1,a,78,54
3,2,b,55,98
4,2,a,64,28


In [408]:
var.pivot_table(index="name",columns="days",
               aggfunc="mean",margins=True)

Unnamed: 0_level_0,eng,eng,eng,maths,maths,maths
days,1,2,All,1,2,All
name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
a,71.5,64.0,69.0,59.5,28.0,49.0
b,98.0,55.0,76.5,78.0,98.0,88.0
All,80.333333,59.5,72.0,65.666667,63.0,64.6
