<a href="https://colab.research.google.com/github/ShlokGhadekar/PythonBasics/blob/main/Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Pandas is a powerful Python library used for data manipulation, analysis, and preprocessing, which is essential in AI and machine learning. It provides two main data structures:
	•	Series (1D labeled array)
	•	DataFrame (2D table, similar to an Excel sheet)

In [1]:
import pandas as pd

In [23]:
data={'Name':['Alice','Bob','Charlie'],
      'Age':[25,30,35],
      'Salary':[50000,60000,70000],
      'Category': ['A', 'B', 'A']}
df = pd.DataFrame(data)
print(df)

      Name  Age  Salary Category
0    Alice   25   50000        A
1      Bob   30   60000        B
2  Charlie   35   70000        A


In [None]:
print(df.shape)      # (rows, columns)
print(df.columns)    # Column names
print(df.info())     # Data types & memory usage
print(df.describe()) # Statistics (mean, std, min, max)

In [4]:
print(df['Name'])
print(df[['Name','Age']])
print(df.iloc[0]) #locating data by index(first row)
print(df.loc[0,'Name']) #First row, Name colummn(by label)

0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35
Name      Alice
Age          25
Salary    50000
Name: 0, dtype: object
Alice


In [5]:
df.dropna() #removes rows with missing values
df.fillna(0) #fills missing values with 0

Unnamed: 0,Name,Age,Salary
0,Alice,25,50000
1,Bob,30,60000
2,Charlie,35,70000


In [9]:
df['New Column']=df['Salary']+10000 #creates a new column
df.rename(columns={'New Column':'Updated_salary'},inplace=True) #renames column
print(df)

      Name  Age  Salary  Updated_salary  Updated_salary  Updated_salary
0    Alice   25   50000         55000.0         55000.0           60000
1      Bob   30   60000         66000.0         66000.0           70000
2  Charlie   35   70000         77000.0         77000.0           80000


In [13]:
df_numeric = df.select_dtypes(include=['number'])  # Keep only numeric columns
df.groupby('Age')[df_numeric.columns].mean()

Unnamed: 0_level_0,Age,Salary,Updated_salary,Updated_salary,Updated_salary,Updated_salary,Updated_salary,Updated_salary,Updated_salary,Updated_salary,Updated_salary
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
25,25.0,50000.0,55000.0,55000.0,60000.0,55000.0,55000.0,60000.0,55000.0,55000.0,60000.0
30,30.0,60000.0,66000.0,66000.0,70000.0,66000.0,66000.0,70000.0,66000.0,66000.0,70000.0
35,35.0,70000.0,77000.0,77000.0,80000.0,77000.0,77000.0,80000.0,77000.0,77000.0,80000.0


In [14]:
df['Salary'] = df['Salary'].apply(lambda x: x * 1.05)  # Increase Salary by 5%

In [15]:
df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2], 'Salary': [50000, 60000]})

merged_df = pd.merge(df1, df2, on='ID')  # Merge on 'ID'

In [16]:
df.to_csv("output.csv", index=False)  # Save DataFrame to CSV

In [24]:
#Convert Text Columns to Numbers (Label Encoding
df['Category'] = df['Category'].astype('category').cat.codes

In [25]:
#One-Hot Encoding (For Categorical Features)
df = pd.get_dummies(df, columns=['Category'], drop_first=True)

In [26]:
#Feature Scaling & Normalization
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df[['Salary']] = scaler.fit_transform(df[['Salary']])

In [28]:
##FOR LARGE DATASETS##
#chunk_size = 10000
#for chunk in pd.read_csv("large_data.csv", chunksize=chunk_size):
 #   process(chunk)  # Process each chunk separately