**Basic Data Frames Operations**

In [None]:
================== Data Analysis ====================
1.   Pandas              : Data frame analysis
2.   Numpy               : Numerical analysis
3.   Matplotlib          : Graphs and plots
4.   Seaborn             : Plots
5.   Bokeh               : Plots
6.   Plotly              : Plots

================== Machine Learning =================
7.   Stats               : Statistical models
8.   Scikit-learn        : ML models

================== Deep Learning ====================
9.   OpenCV              : Computer vision
10.  Pillow              : Image operations
11.  TensorFlow          : Neural network creation (Google)
12.  Keras               : Neural networks
13.  PyTorch             : Alternative to TensorFlow (Meta/Facebook)

====================== NLP ==========================
14.  NLTK                : Natural language toolkit
15.  SpaCy               : Alternative to NLTK
16.  WordCloud           : Most frequent words

==================== Scraping =======================
17.  SQLite              : Database creation
18.  Beautiful Soup      : Web scraping
19.  Selenium            : Web automation

================== API Creation =====================
20.  Flask               : Web framework
21.  FastAPI             : Web framework
22.  Gradio              : API/UI creation
23.  Django              : Web framework

================== UI App Creation ==================
24.  Streamlit           : UI/Apps for ML/DL

========== Transfer Learning Models (DL) ============
25.  MobileNet           : CNN model
26.  ResNet              : CNN model
27.  VGGNet              : CNN model
28.  Inception           : CNN model
29.  YOLO                : Object detection (Ultralytics)

========== Transfer Learning Models (NLP) ===========
30.  Word2Vec            : Word embeddings
31.  GloVe               : Word embeddings

=============== Hugging Face Transformers ===========
32.  BERT                : Bi-directional Encoder Representations

==================== Allen NLP ======================
33.  AllenNLP            : NLP research library

====================== GenAI =========================
34.  LangChain           : LLM orchestration
35.  Google Gemini       : Google’s GenAI models
36.  OpenAI GPT          : OpenAI’s LLMs
37.  Amazon Bedrock      : AWS GenAI platform
38.  Meta LLaMA          : Meta’s LLM family

================ Image and Video Generation ==========
39.  GAN models          : Image/Video generation packages
40.  SORA                : Video generation (OpenAI)

================ Model Deployment (MLOps) ============
41.  MLflow              : Using Databricks
42.  Kubeflow            : GCP MLOps platform

================== Cloud Applications ================
43.  Azure ML            : Microsoft cloud ML platform
44.  GCP Vertex AI       : Google cloud ML platform
45.  AWS SageMaker       : AWS cloud ML platform

======================= Small ========================
46.  Time                : Time operations
47.  Logging             : Logging utilities
48.  Math                : Mathematical operations
49.  Random              : Random number generation
50.  Env                 : Environment variables
51.  OS                  : Operating system utilities


**Step-1: Import Packages**

In [1]:
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns 

**Step-2: Create a DataFrame using List**

- data : data is either list or dictionary format

- index : we need to provide a list

- columns : we need to provide a list

In [3]:
names=['Ramesh','Suresh','Satish']
pd.DataFrame(names,columns=['Names'])
#pd.DataFrame(data,index,columns)

Unnamed: 0,Names
0,Ramesh
1,Suresh
2,Satish


**Step-3: Change the Index**

In [6]:
names=['Ramesh','Suresh','Satish']
idx=['A','B','C']
cols=['Names']
pd.DataFrame(names,index=idx,columns=cols)

Unnamed: 0,Names
A,Ramesh
B,Suresh
C,Satish


**Step-4: Add Multiple Columns**

In [9]:
names=['Ramesh','Suresh','Satish']
idx=['A','B','C']
cols=['Names','Age']
age=[20,22,24]
pd.DataFrame(zip(names,age),index=idx,columns=cols)

Unnamed: 0,Names,Age
A,Ramesh,20
B,Suresh,22
C,Satish,24


**Step-5: Create empty DataFrame and update the columns**

In [15]:
df=pd.DataFrame()
df['Names']=['Ramesh','Suresh','Satish']
df['Age']=[20,22,24]
df

Unnamed: 0,Names,Age
0,Ramesh,20
1,Suresh,22
2,Satish,24


**Step-6: Add a new column with existing DataFrame**

In [33]:
names=['Ramesh','Suresh','Satish']
idx=['A','B','C']
cols=['Names','Age']
age=[20,22,24]
df=pd.DataFrame(zip(names,age),index=idx,columns=cols)
df['City']=['Pune','Hyderabad','Bangalore']
df

Unnamed: 0,Names,Age,City
A,Ramesh,20,Pune
B,Suresh,22,Hyderabad
C,Satish,24,Bangalore


**Step-7: Override the column values**
- I want to overrite my age values
- Origionally age was=[20,22,24]
- I want to update the age=[30,32,34]

In [36]:
names=['Ramesh','Suresh','Satish']
idx=['A','B','C']
cols=['Names','Age']
age=[20,22,24]
df=pd.DataFrame(zip(names,age),index=idx,columns=cols)
df['Age']=[30,32,34]
df

Unnamed: 0,Names,Age
A,Ramesh,30
B,Suresh,32
C,Satish,34


In [35]:
names=['Ramesh','Suresh','Satish']
idx=['A','B','C']
cols=['Names','Age']
age=[20,22,24]
df=pd.DataFrame(zip(names,age),index=idx,columns=cols)
df['Age']=[30,32,34]
df['age']=[30,32,34]
df

Unnamed: 0,Names,Age,age
A,Ramesh,30,30
B,Suresh,32,32
C,Satish,34,34


**Step-8: Drop The Column**

- lables
  
- axis
  
- index
  
- columns
  
- inplace

In [20]:
df

Unnamed: 0,Names,Age,age
A,Ramesh,30,30
B,Suresh,32,32
C,Satish,34,34


In [22]:
df.drop('age') #error
#here python will assume age is a label
#python ask question :is it index label or column label
#axis=1 means for columns
#axis=0 means for rows
#by default axis=0

KeyError: "['age'] not found in axis"

In [24]:
df.drop('age',axis=0)

KeyError: "['age'] not found in axis"

In [30]:
df.drop('age',axis=1)
df

Unnamed: 0,Names,Age,age
A,Ramesh,30,30
B,Suresh,32,32
C,Satish,34,34


In [38]:
df.drop('age',axis=1,inplace=True)
df

Unnamed: 0,Names,Age
A,Ramesh,30
B,Suresh,32
C,Satish,34


In [41]:
names=['Ramesh','Suresh','Satish']
idx=['A','B','C']
cols=['Names','Age']
age=[20,22,24]
df=pd.DataFrame(zip(names,age),index=idx,columns=cols)
df['Age']=[30,32,34]
df['age']=[11,12,13]
df

Unnamed: 0,Names,Age,age
A,Ramesh,30,11
B,Suresh,32,12
C,Satish,34,13


In [42]:
df.drop(columns=['age'],inplace=True)
#no need of axis
df

Unnamed: 0,Names,Age
A,Ramesh,30
B,Suresh,32
C,Satish,34


In [49]:
names=['Ramesh','Suresh','Satish']
idx=['A','B','C']
cols=['Names','Age']
age=[20,22,24]
df=pd.DataFrame(zip(names,age),index=idx,columns=cols)
df['Age']=[30,32,34]
df['age']=[11,12,13]
df

Unnamed: 0,Names,Age,age
A,Ramesh,30,11
B,Suresh,32,12
C,Satish,34,13


In [51]:
df.drop('C',axis=0)

Unnamed: 0,Names,Age,age
A,Ramesh,30,11
B,Suresh,32,12


**Step-9: Rename The Columns**
- mapper: dictionary {'City':'Cities'}
- index:
- columns: 
- axis:
- copy:
- inplace:

In [16]:
import pandas as pd
names=['Ramesh','Suresh','Satish']
idx=['A','B','C']
cols=['Names','Age']
age=[20,22,24]
df=pd.DataFrame(zip(names,age),index=idx,columns=cols)
df['City']=['Pune','Hyderabad','Bangalore']
df

Unnamed: 0,Names,Age,City
A,Ramesh,20,Pune
B,Suresh,22,Hyderabad
C,Satish,24,Bangalore


In [4]:
dict={'City':'Cities'}
df.rename(dic)
df

Unnamed: 0,Names,Age,City
A,Ramesh,20,Pune
B,Suresh,22,Hyderabad
C,Satish,24,Bangalore


In [6]:
df.rename(columns=dict)
df

Unnamed: 0,Names,Age,City
A,Ramesh,20,Pune
B,Suresh,22,Hyderabad
C,Satish,24,Bangalore


In [7]:
df.rename(columns=dict,inplace=True)
df

Unnamed: 0,Names,Age,Cities
A,Ramesh,20,Pune
B,Suresh,22,Hyderabad
C,Satish,24,Bangalore


In [25]:
import pandas as pd
names=['Ramesh','Suresh','Satish']
idx=['A','B','C']
cols=['Names','Age']
age=[20,22,24]
df=pd.DataFrame(zip(names,age),index=idx,columns=cols)
df['City']=['Pune','Hyderabad','Bangalore']
df.rename(dict,axis=1)
df

Unnamed: 0,Names,Age,City
A,Ramesh,20,Pune
B,Suresh,22,Hyderabad
C,Satish,24,Bangalore


In [11]:
df.rename(dict,axis=1,inplace=True)
df

Unnamed: 0,Names,Age,Cities
A,Ramesh,20,Pune
B,Suresh,22,Hyderabad
C,Satish,24,Bangalore


In [26]:
dict={'B':2}
df.rename(dict)
df

Unnamed: 0,Names,Age,City
A,Ramesh,20,Pune
B,Suresh,22,Hyderabad
C,Satish,24,Bangalore


In [22]:
dict={'A':1,'B':2,'C':3}
df.rename(dict,inplace=True)
df

Unnamed: 0,Names,Age,City
1,Ramesh,20,Pune
2,Suresh,22,Hyderabad
3,Satish,24,Bangalore


In [61]:
df1=pd.DataFrame()
df1['Value']=[i for i in range(1,11)]
df1['Square_Value']=[i**2 for i in range(1,11)]
df1['Cube_Value']=[i**3 for i in range(1,11)]
df1

Unnamed: 0,Value,Square_Value,Cube_Value
0,1,1,1
1,2,4,8
2,3,9,27
3,4,16,64
4,5,25,125
5,6,36,216
6,7,49,343
7,8,64,512
8,9,81,729
9,10,100,1000


- len

- columns

- shape

- dtypes

- head

- tail


In [31]:
len(df1)

10

In [34]:
df1.columns

Index(['Value', 'Square_Value', 'Cube_Value'], dtype='object')

In [42]:
df1.columns.to_list()

['Value', 'Square_Value', 'Cube_Value']

In [35]:
df1.shape

(10, 3)

In [45]:
df1.dtypes
#integer we will get integer
#float we will get float
#string we will get object

Value           int64
Square_Value    int64
Cube_Value      int64
dtype: object

In [43]:
df1.head()

Unnamed: 0,Value,Square_Value,Cube_Value
0,1,1,1
1,2,4,8
2,3,9,27
3,4,16,64
4,5,25,125


In [44]:
df1.tail()

Unnamed: 0,Value,Square_Value,Cube_Value
5,6,36,216
6,7,49,343
7,8,64,512
8,9,81,729
9,10,100,1000


In [48]:
df1.isnull()
#we are asking a question
#a null value available or not
#True or False
#True means yes NULL available
#False means no NULL value available

Unnamed: 0,Value,Square_Value,Cube_Value
0,False,False,False
1,False,False,False
2,False,False,False
3,False,False,False
4,False,False,False
5,False,False,False
6,False,False,False
7,False,False,False
8,False,False,False
9,False,False,False


In [49]:
df1.isnull().sum()
#column wise null value count will be displayed

Value           0
Square_Value    0
Cube_Value      0
dtype: int64

In [50]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype
---  ------        --------------  -----
 0   Value         10 non-null     int64
 1   Square_Value  10 non-null     int64
 2   Cube_Value    10 non-null     int64
dtypes: int64(3)
memory usage: 372.0 bytes


In [None]:
**Step-10: Change a Specific Value of a column**

-df1.drop
    
    -either it will drop column or index
    
-df1.rename
    
    -either it will rename column or index

-df1.replace

    -replece previous value to new value

In [62]:
dict2={64:464,1:111} #all data will replace
df1.replace(dict2)

Unnamed: 0,Value,Square_Value,Cube_Value
0,111,111,111
1,2,4,8
2,3,9,27
3,4,16,464
4,5,25,125
5,6,36,216
6,7,49,343
7,8,464,512
8,9,81,729
9,10,100,1000


**Step-11: Selection Of A Column**

In [63]:
df1['Cube_Value']
#series
#series looks like data frame
#rows and columns

0       1
1       8
2      27
3      64
4     125
5     216
6     343
7     512
8     729
9    1000
Name: Cube_Value, dtype: int64

In [64]:
type(df1)

pandas.core.frame.DataFrame

In [65]:
type(df1['Cube_Value'])

pandas.core.series.Series

In [66]:
df1.shape
# 2D datta

(10, 3)

In [68]:
df1['Cube_Value'].shape
#1D Data

(10,)

In [72]:
df1[['Cube_Value']]

Unnamed: 0,Cube_Value
0,1
1,8
2,27
3,64
4,125
5,216
6,343
7,512
8,729
9,1000


In [70]:
type(df1[['Cube_Value']])

pandas.core.frame.DataFrame

In [71]:
df1[['Cube_Value']].shape

(10, 1)

In [74]:
#square value
#cube value
#multiple always keep in list
df1['Cube_Value','Square_Value']

KeyError: ('Cube_Value', 'Square_Value')

In [75]:
df1[['Cube_Value','Square_Value']]

Unnamed: 0,Cube_Value,Square_Value
0,1,1
1,8,4
2,27,9
3,64,16
4,125,25
5,216,36
6,343,49
7,512,64
8,729,81
9,1000,100
