**Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.**

In [1]:
import pandas as pd

# Creating a Pandas Series with the given data
data = [4, 8, 15, 16, 23, 42]
series = pd.Series(data)

# Printing the created series
print(series)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


**Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.**

In [2]:
import pandas as pd

# Creating a list with 10 elements
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Creating a Pandas Series using the list
series = pd.Series(my_list)

# Printing the created series
print(series)

0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64


**Q3. Create a Pandas DataFrame that contains the following data:**
    
![Screenshot%202023-08-19%20155532.png](attachment:Screenshot%202023-08-19%20155532.png)

**Then, print the DataFrame.**

In [3]:
import pandas as pd

# Creating a dictionary with the given data
data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}

# Creating a Pandas DataFrame using the dictionary
df = pd.DataFrame(data)

# Printing the created DataFrame
print(df)


     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


**Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.**

A `DataFrame` in pandas is a `two-dimensional`, `size-mutable`, and `heterogeneous` tabular data structure. It can be thought of as a table with rows and columns, similar to a spreadsheet or a database table. Each column in a DataFrame can contain different types of data (numeric, string, boolean, etc.), making it more versatile for handling `complex` data.

A `Series`, on the other hand, is a `one-dimensional array-like` object that can hold any data type. It is similar to a column in a DataFrame or a single row in a spreadsheet.

Here's an example to illustrate the difference:

In [4]:
import pandas as pd

# Creating a Series
series_data = pd.Series([4, 8, 15, 16, 23, 42], name="Numbers")

# Creating a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}
df = pd.DataFrame(data)

print("Series:")
print(series_data)
print("\nDataFrame:")
print(df)

Series:
0     4
1     8
2    15
3    16
4    23
5    42
Name: Numbers, dtype: int64

DataFrame:
     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


**Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can you give an example of when you might use one of these functions?**

In [5]:
# Importing the pandas library
import pandas as pd

# Creating a dictionary with the given data
data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}

# Creating a Pandas DataFrame using the dictionary
df = pd.DataFrame(data)

# Printing the created DataFrame
print("Original DataFrame:")
print(df)

# Sorting the DataFrame by 'Age' column in ascending order
sorted_df = df.sort_values(by='Age')
print("\nSorted DataFrame:")
print(sorted_df)

# Filtering the DataFrame to select rows where Age is greater than 25
filtered_df = df[df['Age'] > 25]
print("\nFiltered DataFrame (Age > 25):")
print(filtered_df)

# Adding a new column 'Nationality' to the DataFrame
df['Nationality'] = ['USA', 'Canada', 'France']
print("\nDataFrame with Added Column:")
print(df)

# Grouping the DataFrame by 'Gender' and calculating the mean age within each group
grouped_df = df.groupby('Gender')['Age'].mean()
print("\nGrouped DataFrame - Mean Age by Gender:")
print(grouped_df)

# Applying a custom function to the 'Age' column to categorize ages
def age_category(age):
    if age < 30:
        return 'Young'
    else:
        return 'Adult'

df['Age Category'] = df['Age'].apply(age_category)
print("\nDataFrame with Age Categories:")
print(df)

# Displaying the first 2 rows of the DataFrame
print("Head:")
print(df.head(2))

# Displaying the last 2 rows of the DataFrame
print("\nTail:")
print(df.tail(2))

# Getting the dimensions of the DataFrame
print("\nShape:")
print(df.shape)

# Getting the list of column names
print("\nColumns:")
print(df.columns.tolist())

# Getting the index (row labels) of the DataFrame
print("\nIndex:")
print(df.index.tolist())

# Providing a concise summary of the DataFrame
print("\nInfo:")
df.info()

# Generating descriptive statistics for numerical columns
print("\nDescribe:")
print(df.describe())

# Getting unique values in the 'Age' column
print("\nUnique values in Age:")
print(df['Age'].unique())

# Getting the number of unique values in each column
print("\nNumber of unique values in each column:")
print(df.nunique())

# Counting the occurrences of each unique value in the 'Gender' column
print("\nValue counts for Gender:")
print(df['Gender'].value_counts())

# Accessing a specific cell using row and column labels
print("\nAccessing cell using loc:")
print(df.loc[1, 'Age'])

# Accessing a specific cell using integer row and column indices
print("\nAccessing cell using iloc:")
print(df.iloc[1, 2])

# Removing the 'Nationality' column
print("\nDataFrame after dropping 'Nationality' column:")
df_drop = df.drop(columns=['Nationality'])
print(df_drop)

# Filling missing values in the DataFrame
print("\nDataFrame after filling missing values:")
df_filled = df.fillna(value=0)
print(df_filled)

# Checking for missing values in the DataFrame
print("\nChecking for missing values:")
print(df.isnull())

# Applying a function to each element of the 'Age' column
print("\nApplying a function to 'Age' column:")
age_squared = df['Age'].apply(lambda x: x ** 2)
print(age_squared)

# Applying a function to each element of the entire DataFrame
print("\nApplying a function to the entire DataFrame:")
df_applymap = df.applymap(lambda x: str(x) + "_modified")
print(df_applymap)

# Merging two DataFrames based on a common column
print("\nMerging two DataFrames:")
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Value': ['A', 'B', 'C']})
df2 = pd.DataFrame({'ID': [2, 3, 4], 'Value': ['X', 'Y', 'Z']})
merged_df = pd.merge(df1, df2, on='ID')
print(merged_df)

# Creating a pivot table from the DataFrame
print("\nPivot Table:")
pivot_table = df.pivot_table(values='Age', index='Gender', columns='Name', aggfunc='mean')
print(pivot_table)

# Grouping the DataFrame and applying aggregation functions
print("\nGrouped DataFrame - Mean Age by Gender:")
grouped_df = df.groupby('Gender')['Age'].agg(['mean', 'min', 'max'])
print(grouped_df)

# Sorting the DataFrame by the 'Age' column in descending order
print("\nSorted DataFrame:")
sorted_df_desc = df.sort_values(by='Age', ascending=False)
print(sorted_df_desc)

# Removing duplicate rows based on the 'Age' and 'Gender' columns
print("\nDataFrame after dropping duplicates:")
df_no_duplicates = df.drop_duplicates(subset=['Age', 'Gender'])
print(df_no_duplicates)

# Computing the correlation matrix for numerical columns
print("\nCorrelation Matrix:")
correlation_matrix = df.corr()
print(correlation_matrix)


Original DataFrame:
     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female

Sorted DataFrame:
     Name  Age  Gender
0   Alice   25  Female
2  Claire   27  Female
1     Bob   30    Male

Filtered DataFrame (Age > 25):
     Name  Age  Gender
1     Bob   30    Male
2  Claire   27  Female

DataFrame with Added Column:
     Name  Age  Gender Nationality
0   Alice   25  Female         USA
1     Bob   30    Male      Canada
2  Claire   27  Female      France

Grouped DataFrame - Mean Age by Gender:
Gender
Female    26.0
Male      30.0
Name: Age, dtype: float64

DataFrame with Age Categories:
     Name  Age  Gender Nationality Age Category
0   Alice   25  Female         USA        Young
1     Bob   30    Male      Canada        Adult
2  Claire   27  Female      France        Young
Head:
    Name  Age  Gender Nationality Age Category
0  Alice   25  Female         USA        Young
1    Bob   30    Male      Canada        Adult

Tail:
     Name  Age  Gender N

  correlation_matrix = df.corr()


**Q6. Which of the following is mutable in nature Series, DataFrame, Panel?**

The `mutable` nature of objects refers to whether they can be changed after creation. Let's break down the `mutability` of `Series`, `DataFrame`, and `Panel:`

In [6]:
# Importing the necessary libraries
import pandas as pd

# Mutable nature of Series, DataFrame, and Panel
# Series: Mutable (can be modified after creation)
# DataFrame: Mutable (can be modified after creation)
# Panel: It was Mutable in nature (can be modified after creation)

# Creating a Series
data_series = pd.Series([10, 20, 30, 40, 50])

# Modifying a value in the Series
data_series[2] = 35

# Displaying the modified Series
print("Modified Series:")
print(data_series)

# Creating a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27]
}
df = pd.DataFrame(data)

# Modifying a value in the DataFrame
df.at[1, 'Age'] = 32

# Displaying the modified DataFrame
print("\nModified DataFrame:")
print(df)

Modified Series:
0    10
1    20
2    35
3    40
4    50
dtype: int64

Modified DataFrame:
     Name  Age
0   Alice   25
1     Bob   32
2  Claire   27


In [7]:
# In older versions of pandas, a `Panel` was a 3D data structure used to handle heterogeneous data.
# It could be thought of as a container for multiple DataFrames (similar to how DataFrame is a container for Series).
# However, due to its complexity and limited use cases, it was deprecated in recent versions of pandas.
# It has been replaced with other data structures like MultiIndex DataFrames.

# Example of using a Panel (consider this for educational purposes only, as Panel is deprecated):

# Importing the necessary libraries
import pandas as pd

# Creating a Panel
data_panel = pd.Panel({
    'Item1': pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}),
    'Item2': pd.DataFrame({'A': [7, 8, 9], 'B': [10, 11, 12]})
})

# Displaying the Panel
print("Original Panel:")
print(data_panel)

# Modifying a value in the Panel (not recommended due to deprecation)
data_panel['Item1'].at[2, 'B'] = 30

# Displaying the modified Panel (not recommended due to deprecation)
print("\nModified Panel:")
print(data_panel)


AttributeError: module 'pandas' has no attribute 'Panel'

**Q7. Create a DataFrame using multiple Series. Explain with an example.**

In [8]:
# Importing the necessary library
import pandas as pd

# Creating Series for different columns of the DataFrame
names_series = pd.Series(['Alice', 'Bob', 'Claire'])
age_series = pd.Series([25, 30, 27])
gender_series = pd.Series(['Female', 'Male', 'Female'])
country_series = pd.Series(['USA', 'Canada', 'UK'])

# Creating a DataFrame using the Series
data = {
    'Name': names_series,
    'Age': age_series,
    'Gender': gender_series,
    'Country': country_series
}

df = pd.DataFrame(data)

# Displaying the created DataFrame
print("Created DataFrame:")
print(df)

Created DataFrame:
     Name  Age  Gender Country
0   Alice   25  Female     USA
1     Bob   30    Male  Canada
2  Claire   27  Female      UK
