**1. Creating DataFrames**

Definition: Construct DataFrame or Series from lists, dictionaries, or files.

In [1]:
import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [24, 27]})


In [2]:
df

Unnamed: 0,Name,Age
0,Alice,24
1,Bob,27


**2. Basic Viewing Operations**

df.head() / df.tail(): View first or last n rows.

In [6]:
df.head(1)



Unnamed: 0,Name,Age
0,Alice,24


In [4]:
df.tail(1)

Unnamed: 0,Name,Age
1,Bob,27


**df.shape:**

 Returns a tuple representing the dimensions of the DataFrame.

In [7]:
df.shape  # Output: (2, 2)


(2, 2)

df.columns / df.index: Shows column names and index.

In [10]:
df.columns  # Output: ['Name', 'Age']



Index(['Name', 'Age'], dtype='object')

In [9]:
df.index  # Output: RangeIndex(start=0, stop=2, step=1)

RangeIndex(start=0, stop=2, step=1)

**3. Basic Summary and Info**

df.describe(): Summary statistics of numerical columns.

df.info(): Overview of columns, data types, and memory usage.

In [11]:
df.describe()



Unnamed: 0,Age
count,2.0
mean,25.5
std,2.12132
min,24.0
25%,24.75
50%,25.5
75%,26.25
max,27.0


In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    2 non-null      object
 1   Age     2 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 160.0+ bytes


**4. Selecting Data**


df['column'] / df[['col1', 'col2']]: Select one or more columns.

In [13]:
df['Name']  # Selects the 'Name' column



Unnamed: 0,Name
0,Alice
1,Bob


In [14]:
df[['Name', 'Age']]  # Selects multiple columns

Unnamed: 0,Name,Age
0,Alice,24
1,Bob,27


**df.loc[] and df.iloc[]:**


Access rows and columns by labels or index positions.

In [15]:
df.loc[0, 'Name']  # Output: 'Alice'



'Alice'

In [16]:
df.iloc[0, 1]  # Output: 24

24

***5. Filtering Data***

Boolean Masking: Filter rows based on conditions.

In [17]:
df[df['Age'] > 25]  # Filters rows where 'Age' > 25


Unnamed: 0,Name,Age
1,Bob,27


**6. Data Cleaning**

df.dropna(): Remove rows with missing values.

df.fillna(value): Fill missing values with a specified value.

In [18]:
df.dropna()



Unnamed: 0,Name,Age
0,Alice,24
1,Bob,27


In [19]:
df.fillna(0)

Unnamed: 0,Name,Age
0,Alice,24
1,Bob,27


df.drop_duplicates(): Removes duplicate rows.

In [20]:
df.drop_duplicates()


Unnamed: 0,Name,Age
0,Alice,24
1,Bob,27


df.rename(): Rename columns.

In [21]:
df.rename(columns={'Age': 'Years'})


Unnamed: 0,Name,Years
0,Alice,24
1,Bob,27


**7. Data Transformation**

Adding Columns: Add a new column.

In [22]:
df['Salary'] = [50000, 60000]


In [23]:
df


Unnamed: 0,Name,Age,Salary
0,Alice,24,50000
1,Bob,27,60000


df.apply(): Apply a function to each element or row/column.

In [24]:
df['Age_Doubled'] = df['Age'].apply(lambda x: x * 2)



In [25]:
df

Unnamed: 0,Name,Age,Salary,Age_Doubled
0,Alice,24,50000,48
1,Bob,27,60000,54


**8. Aggregations and Grouping**

df.groupby(): Group data by column values and aggregate.

In [26]:
df.groupby('Age').sum()


Unnamed: 0_level_0,Name,Salary,Age_Doubled
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
24,Alice,50000,48
27,Bob,60000,54


df.agg(): Aggregate data with custom functions.

In [27]:
df.agg({'Age': 'mean', 'Salary': 'sum'})


Unnamed: 0,0
Age,25.5
Salary,110000.0



df.pivot_table(): Create a pivot table.

In [29]:
df.pivot_table(values='Salary', index='Age', aggfunc='sum')


Unnamed: 0_level_0,Salary
Age,Unnamed: 1_level_1
24,50000
27,60000


**9. Merging and Joining**

pd.merge(): Merge two DataFrames on a key.

In [30]:
df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [24, 27]})
df2 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Salary': [50000, 60000]})
pd.merge(df1, df2, on='Name')


Unnamed: 0,Name,Age,Salary
0,Alice,24,50000
1,Bob,27,60000


In [31]:
df1

Unnamed: 0,Name,Age
0,Alice,24
1,Bob,27


In [32]:
df2

Unnamed: 0,Name,Salary
0,Alice,50000
1,Bob,60000


pd.concat(): Concatenate DataFrames along a particular axis.

In [33]:
pd.concat([df1, df2], axis=1)  # Concatenates along columns


Unnamed: 0,Name,Age,Name.1,Salary
0,Alice,24,Alice,50000
1,Bob,27,Bob,60000


In [34]:
pd.concat([df1, df2], axis=0)  # Concatenates along columns


Unnamed: 0,Name,Age,Salary
0,Alice,24.0,
1,Bob,27.0,
0,Alice,,50000.0
1,Bob,,60000.0


**10. Date and Time Handling**

pd.to_datetime(): Convert columns to datetime.

In [35]:
df['Date'] = pd.to_datetime(df['Date'])


KeyError: 'Date'

df['Date'].dt: Extract components like day, month, year.



In [36]:
df['Year'] = df['Date'].dt.year


KeyError: 'Date'

**11. Sorting and Ordering**

df.sort_values(): Sort DataFrame by specified column(s).

In [37]:
df.sort_values(by='Age', ascending=False)


Unnamed: 0,Name,Age,Salary,Age_Doubled
1,Bob,27,60000,54
0,Alice,24,50000,48


In [38]:
df.sort_values(by='Age', ascending=False)


Unnamed: 0,Name,Age,Salary,Age_Doubled
1,Bob,27,60000,54
0,Alice,24,50000,48


In [39]:
df.sort_values(by='Age', ascending=True)


Unnamed: 0,Name,Age,Salary,Age_Doubled
0,Alice,24,50000,48
1,Bob,27,60000,54


**12. Exporting and Importing Data**

pd.read_csv() / df.to_csv(): Read from or write to CSV.

In [40]:
df.to_csv('output.csv', index=False)


In [41]:
df.to_excel('output.xlsx', index=False)
