# PandasAssignment

Q1. How do you load a CSV file into a Pandas DataFrame?

To read the file, we can use the read_csv method from Pandas. First, let's install the Pandas library.
We can now import the pandas module. As a convention, it is imported with the alias pd

In [59]:
import pandas as pd

# Load the CSV file into a DataFrame
df = pd.read_csv('./filename.csv')

# Display the DataFrame
print(df)


     Name  Age  Income
0   Ahmad   25   50000
1   Haris   30   60000
2  Junaid   35   70000


Q2. How do you check the data type of a column in a Pandas DataFrame?

In [60]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Ahmad', 'Haris', 'Junaid'],
        'Age': [25, 30, 35],
        'Salary': [50000, 60000, 70000]}
df = pd.DataFrame(data)

# Check the data type of the 'Age' column
print(df['Age'].dtype)


int64


Q3. How do you select rows from a Pandas DataFrame based on a condition?

In [61]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Ahmad', 'Haris', 'Junaid'],
        'Age': [25, 30, 35],
        'Salary': [50000, 60000, 70000]}
df = pd.DataFrame(data)

# Select rows where the age is greater than 30
result = df[df['Age'] > 30]

# Display the result
print(result)


     Name  Age  Salary
2  Junaid   35   70000


Q4. How do you rename columns in a Pandas DataFrame?

In [62]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Ahmad', 'Haris', 'Junaid'],
        'Age': [25, 30, 35],
        'Salary': [50000, 60000, 70000]}
df = pd.DataFrame(data)

# Rename the 'Salary' column to 'Income'
df = df.rename(columns={'Salary': 'Income'})

# Display the renamed DataFrame
print(df)


     Name  Age  Income
0   Ahmad   25   50000
1   Haris   30   60000
2  Junaid   35   70000


Q5. How do you drop columns in a Pandas DataFrame?

In [63]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Ahmad', 'Haris', 'Junaid'],
        'Age': [25, 30, 35],
        'Salary': [50000, 60000, 70000]}
df = pd.DataFrame(data)

# Drop the 'Salary' column
df = df.drop(columns=['Salary'])

# Display the modified DataFrame
print(df)


     Name  Age
0   Ahmad   25
1   Haris   30
2  Junaid   35


Q6. How do you find the unique values in a column of a Pandas DataFrame?

In [64]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Ahmad', 'Haris', 'Junaid', 'Talha'],
        'Age': [25, 30, 35, 25],
        'Salary': [50000, 60000, 70000, 50000]}
df = pd.DataFrame(data)

# Find the unique values in the 'Name' column
unique_names = df['Name'].unique()

# Display the unique values
print(unique_names)

['Ahmad' 'Haris' 'Junaid' 'Talha']


Q7. How do you find the number of missing values in each column of a Pandas DataFrame?

In [65]:
import pandas as pd

# Create a DataFrame with some missing values
data = {'Name': ['Ahmad', 'Haris', 'Junaid', None],
        'Age': [25, 30, None, 25],
        'Salary': [50000, None, 70000, 50000]}
df = pd.DataFrame(data)

# Find the number of missing values in each column
missing_values = df.isnull().sum()

# Display the number of missing values
print(missing_values)


Name      1
Age       1
Salary    1
dtype: int64


Q8. How do you fill missing values in a Pandas DataFrame with a specific value?

In [66]:
import pandas as pd
import numpy as np

# Create a DataFrame with some missing values
data = {'Name': ['Ahmad', 'Haris', 'Junaid', None],
        'Age': [25, 30, None, 25],
        'Salary': [50000, None, 70000, 50000]}
df = pd.DataFrame(data)

# Fill missing values with a specific value
df = df.fillna(value=np.nan)

# Display the modified DataFrame
print(df)


     Name   Age   Salary
0   Ahmad  25.0  50000.0
1   Haris  30.0      NaN
2  Junaid   NaN  70000.0
3     NaN  25.0  50000.0


Q9. How do you concatenate two Pandas DataFrames?

In [67]:
import pandas as pd

# Create two DataFrames
data1 = {'Name': ['Ahmad', 'Haris', 'Junaid'],
         'Age': [25, 30, 35]}
df1 = pd.DataFrame(data1)

data2 = {'Name': ['Khalfan', 'Wahab', 'Imran'],
         'Age': [40, 45, 50]}
df2 = pd.DataFrame(data2)

# Concatenate the two DataFrames
result = pd.concat([df1, df2])

# Display the concatenated DataFrame
print(result)

      Name  Age
0    Ahmad   25
1    Haris   30
2   Junaid   35
0  Khalfan   40
1    Wahab   45
2    Imran   50


Q10. How do you merge two Pandas DataFrames on a specific column?

In [68]:
import pandas as pd

# Create two DataFrames
data1 = {'Name': ['Ahmad', 'Haris', 'Junaid'],
         'Age': [25, 30, 35],
         'Salary': [50000, 60000, 70000]}
df1 = pd.DataFrame(data1)

data2 = {'Name': ['Khalfan', 'Wahab', 'Imran'],
         'Dept': ['HR', 'Finance', 'IT']}
df2 = pd.DataFrame(data2)

# Merge the two DataFrames on the 'Name' column
result = pd.merge(df1, df2, on='Name')

# Display the merged DataFrame
print(result)


Empty DataFrame
Columns: [Name, Age, Salary, Dept]
Index: []


Q11. How do you group data in a Pandas DataFrame by a specific column and apply an aggregation function?

In [69]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Ahmad', 'Haris', 'Junaid', 'Khalfan', 'Wahab', 'Imran'],
        'Dept': ['HR', 'Finance', 'IT', 'IT', 'HR', 'Finance'],
        'Salary': [50000, 60000, 70000, 80000, 55000, 65000]}
df = pd.DataFrame(data)

# Group the DataFrame by 'Dept' and calculate the mean of 'Salary' for each group
result = df.groupby('Dept')['Salary'].mean()

# Display the result
print(result)


Dept
Finance    62500.0
HR         52500.0
IT         75000.0
Name: Salary, dtype: float64


Q12. How do you pivot a Pandas DataFrame?

In [70]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Ahmad', 'Haris', 'Junaid', 'Khalfan', 'Wahab', 'Imran'],
        'Dept': ['HR', 'Finance', 'IT', 'IT', 'HR', 'Finance'],
        'Salary': [50000, 60000, 70000, 80000, 55000, 65000]}
df = pd.DataFrame(data)

# Pivot the DataFrame
result = df.pivot(index='Name', columns='Dept', values='Salary')

# Display the result
print(result)


Dept     Finance       HR       IT
Name                              
Ahmad        NaN  50000.0      NaN
Haris    60000.0      NaN      NaN
Imran    65000.0      NaN      NaN
Junaid       NaN      NaN  70000.0
Khalfan      NaN      NaN  80000.0
Wahab        NaN  55000.0      NaN


Q13. How do you change the data type of a column in a Pandas DataFrame?

In [71]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Ahmad', 'Haris', 'Junaid', 'Khalfan', 'Wahab', 'Imran'],
        'Age': ['35', '27', '41', '22', '38', '29']}
df = pd.DataFrame(data)

# Convert the 'Age' column to integer
df['Age'] = df['Age'].astype(int)

# Display the DataFrame
print(df)


      Name  Age
0    Ahmad   35
1    Haris   27
2   Junaid   41
3  Khalfan   22
4    Wahab   38
5    Imran   29


Q14. How do you sort a Pandas DataFrame by a specific column?

In [72]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Ahmad', 'Haris', 'Junaid', 'Khalfan', 'Wahab', 'Imran'],
        'Age': [35, 27, 41, 22, 38, 29],
        'Salary': [50000, 60000, 70000, 80000, 55000, 65000]}
df = pd.DataFrame(data)

# Sort the DataFrame by the 'Salary' column
df_sorted = df.sort_values('Salary')

# Display the sorted DataFrame
print(df_sorted)


      Name  Age  Salary
0    Ahmad   35   50000
4    Wahab   38   55000
1    Haris   27   60000
5    Imran   29   65000
2   Junaid   41   70000
3  Khalfan   22   80000


Q15. How do you create a copy of a Pandas DataFrame?

In [73]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Ahmad', 'Haris', 'Junaid', 'Khalfan', 'Wahab', 'Imran'],
        'Age': [35, 27, 41, 22, 38, 29],
        'Salary': [50000, 60000, 70000, 80000, 55000, 65000]}
df = pd.DataFrame(data)

# Create a copy of the DataFrame
df_copy = df.copy()

# Display the original DataFrame and its copy
print('Original DataFrame:')
print(df)
print('\nCopy of the DataFrame:')
print(df_copy)


Original DataFrame:
      Name  Age  Salary
0    Ahmad   35   50000
1    Haris   27   60000
2   Junaid   41   70000
3  Khalfan   22   80000
4    Wahab   38   55000
5    Imran   29   65000

Copy of the DataFrame:
      Name  Age  Salary
0    Ahmad   35   50000
1    Haris   27   60000
2   Junaid   41   70000
3  Khalfan   22   80000
4    Wahab   38   55000
5    Imran   29   65000


Q16. How do you filter rows of a Pandas DataFrame by multiple conditions?

In [74]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Ahmad', 'Haris', 'Junaid', 'Khalfan', 'Wahab', 'Imran'],
        'Age': [35, 27, 41, 22, 38, 29],
        'Salary': [50000, 60000, 70000, 80000, 55000, 65000]}
df = pd.DataFrame(data)

# Filter the DataFrame by multiple conditions
filtered_df = df[(df['Age'] > 30) & (df['Salary'] < 70000)]

# Display the filtered DataFrame
print(filtered_df)


    Name  Age  Salary
0  Ahmad   35   50000
4  Wahab   38   55000


Q17. How do you calculate the mean of a column in a Pandas DataFrame?

In [75]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Ahmad', 'Haris', 'Junaid', 'Khalfan', 'Wahab', 'Imran'],
        'Age': [35, 27, 41, 22, 38, 29],
        'Salary': [50000, 60000, 70000, 80000, 55000, 65000]}
df = pd.DataFrame(data)

# Calculate the mean of the 'Salary' column
mean_salary = df['Salary'].mean()

# Print the mean to the console
print(mean_salary)


63333.333333333336


Q18. How do you calculate the standard deviation of a column in a Pandas DataFrame?

In [76]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Ahmad', 'Haris', 'Junaid', 'Khalfan', 'Wahab', 'Imran'],
        'Age': [35, 27, 41, 22, 38, 29],
        'Salary': [50000, 60000, 70000, 80000, 55000, 65000]}
df = pd.DataFrame(data)

# Calculate the standard deviation of the 'Salary' column
std_salary = df['Salary'].std()

# Print the standard deviation to the console
print(std_salary)


10801.234497346433


Q19. How do you calculate the correlation between two columns in a Pandas DataFrame?

In [77]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Ahmad', 'Haris', 'Junaid', 'Khalfan', 'Wahab', 'Imran'],
        'Age': [35, 27, 41, 22, 38, 29],
        'Salary': [50000, 60000, 70000, 80000, 55000, 65000],
        'Sales': [10, 20, 30, 40, 50, 60]}
df = pd.DataFrame(data)

# Calculate the correlation between 'Salary' and 'Sales'
corr = df['Salary'].corr(df['Sales'])

# Print the correlation to the console
print(corr)


0.3464101615137754


Q20. How do you select specific columns in a DataFrame using their labels?

In [78]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Ahmad', 'Haris', 'Junaid', 'Khalfan', 'Wahab', 'Imran'],
        'Age': [35, 27, 41, 22, 38, 29],
        'Salary': [50000, 60000, 70000, 80000, 55000, 65000],
        'Sales': [10, 20, 30, 40, 50, 60]}
df = pd.DataFrame(data)

# Select specific columns using their labels
selected_cols = df.loc[:, ['Name', 'Salary', 'Sales']]

# Print the selected columns to the console
print(selected_cols)


      Name  Salary  Sales
0    Ahmad   50000     10
1    Haris   60000     20
2   Junaid   70000     30
3  Khalfan   80000     40
4    Wahab   55000     50
5    Imran   65000     60


Q21. How do you select specific rows in a DataFrame using their indexes?

In [79]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Ahmad', 'Haris', 'Junaid', 'Khalfan', 'Wahab', 'Imran'],
        'Age': [35, 27, 41, 22, 38, 29],
        'Salary': [50000, 60000, 70000, 80000, 55000, 65000],
        'Sales': [10, 20, 30, 40, 50, 60]}
df = pd.DataFrame(data)

# Select specific rows using their indexes
selected_rows = df.iloc[[1, 3, 5]]

# Print the selected rows to the console
print(selected_rows)


      Name  Age  Salary  Sales
1    Haris   27   60000     20
3  Khalfan   22   80000     40
5    Imran   29   65000     60


Q22. How do you sort a DataFrame by a specific column?

In [80]:
import pandas as pd

# Create a DataFrame
data = {'Name': ['Ahmad', 'Haris', 'Junaid', 'Khalfan', 'Wahab', 'Imran'],
        'Age': [35, 27, 41, 22, 38, 29],
        'Salary': [50000, 60000, 70000, 80000, 55000, 65000],
        'Sales': [10, 20, 30, 40, 50, 60]}
df = pd.DataFrame(data)

# Sort the DataFrame by the 'Age' column
df_sorted = df.sort_values(by='Age')

# Print the sorted DataFrame to the console
print(df_sorted)


      Name  Age  Salary  Sales
3  Khalfan   22   80000     40
1    Haris   27   60000     20
5    Imran   29   65000     60
0    Ahmad   35   50000     10
4    Wahab   38   55000     50
2   Junaid   41   70000     30


Q23. How do you create a new column in a DataFrame based on the values of another column?

In [81]:
# To create a new column in a Pandas DataFrame based on the values of another column, 
# you can use the following syntax:

# df['new_column'] = df['existing_column'].apply(your_function)

import pandas as pd

# Create a DataFrame
data = {'price': [10, 20, 30, 40],
        'quantity': [5, 10, 15, 20]}
df = pd.DataFrame(data)

# Create a new column called 'revenue'
df['revenue'] = df['price'] * df['quantity']

# Print the updated DataFrame
print(df)


   price  quantity  revenue
0     10         5       50
1     20        10      200
2     30        15      450
3     40        20      800


Q24. How do you remove duplicates from a DataFrame?

To remove duplicates from a Pandas DataFrame, you can use the drop_duplicates() method. This method removes all rows that have duplicate values in one or more columns.

The basic syntax for drop_duplicates() is as follows:

df.drop_duplicates()

In [82]:
import pandas as pd

# Create a DataFrame with duplicate rows
data = {'name': ['Ahmad', 'Haris', 'Junaid', 'Khalfan', 'Wahab', 'Imran', 'Khalfan', 'Wahab', 'Imran'],
        'age': [25, 30, 35, 25, 40, 30],
        'city': ['Charsadda', 'Mardan', 'Hangu', 'Charsadda', 'Peshawar', 'Kohat']}
df = pd.DataFrame(data)

# Remove duplicate rows
df = df.drop_duplicates()

# Print the updated DataFrame
print(df)


ValueError: All arrays must be of the same length

Q25. What is the difference between .loc and .iloc in Pandas?

In Pandas, both .loc and .iloc are used to select specific rows and columns from a DataFrame. However, there is a subtle difference between the two:

.loc is primarily label-based, which means that you can use it to select data based on the row and column labels.

.iloc, on the other hand, is primarily integer-based, which means that you can use it to select data based on the integer index of the rows and columns.
Here are some examples that demonstrate the difference between .loc and .iloc:

In [None]:
import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'name': ['Ahmad', 'Haris', 'Junaid', 'Khalfan'],
                   'age': [25, 30, 35, 40],
                   'city': ['Charsadda', 'Mardan', 'Hangu', 'Peshawar']})


In [None]:
# To select the row with index 0 and column name, you can use .iloc like this:

print(df.iloc[0, 0])   # Output: Ahmad
print(df.loc[0, 'name'])   # Output: Ahmad


Alice
Alice


In [None]:
# You can also use .loc and .iloc to select multiple rows and columns. For example, 
# to select the first two rows and all columns using .iloc, you can do:
print(df.iloc[:2, :])  

# To select the first two rows and all columns using .loc, you can do:
print(df.loc[:1, :])

    name  age           city
0  Alice   25       New York
1    Bob   30  San Francisco
    name  age           city
0  Alice   25       New York
1    Bob   30  San Francisco
