**Q1.** Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

**Answer:**


In [1]:
import pandas as pd

arr = [4, 8, 15, 16, 23, 42]
series = pd.Series(arr)

print(series)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


Q3. Create a Pandas DataFrame that contains the following data:

|  Name  |  Age  |  Gender  |
|--------|-------|----------|
| Alice  |   25  |  Female  |
|   Bob  |   30  |   Male   |
| Claire |   27  | Female   |

Then print the Dataframe

In [2]:
import pandas as pd

data = { 
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}

df = pd.DataFrame(data)

print(df)

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


**Q4.** What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

**Answer:**

In Pandas, a DataFrame is a two-dimensional data structure that represents tabular data. It consists of rows and columns, similar to a table in a relational database or a spreadsheet. Each column in a DataFrame is a Pandas Series, which is a one-dimensional labeled array capable of holding different data types. 

The main differences between a DataFrame and a Series are as follows:

1. Dimensionality: A DataFrame is a two-dimensional data structure with rows and columns, while a Series is a one-dimensional data structure.

2. Structure: A DataFrame can contain multiple columns, where each column can have a different data type. In contrast, a Series can only represent a single column of data with a homogeneous data type.

3. Flexibility: DataFrames provide more flexibility for data manipulation and analysis. They offer various built-in methods and functions for filtering, merging, grouping, and aggregating data. Series, on the other hand, are useful for performing operations on a single column of data.

Here's an example to illustrate the difference between a DataFrame and a Series:

```python
import pandas as pd

# Creating a DataFrame
data = {
    'Name': ['John', 'Emily', 'Charlie'],
    'Age': [25, 30, 35],
    'Country': ['USA', 'Canada', 'UK']
}

df = pd.DataFrame(data)
print("DataFrame:")
print(df)
print()

# Creating a Series
ages = pd.Series([25, 30, 35], name='Age')
print("Series:")
print(ages)
```

Output:

```
DataFrame:
      Name  Age Country
0     John   25     USA
1    Emily   30  Canada
2  Charlie   35      UK

Series:
0    25
1    30
2    35
Name: Age, dtype: int64
```

In this example, we create a DataFrame `df` using a dictionary with columns 'Name', 'Age', and 'Country'. Each column represents a Pandas Series with the same name as the column label. The DataFrame can hold multiple columns, and each column can have a different data type.

Additionally, we create a Series `ages` that represents the 'Age' column. It contains only the ages and has the name 'Age'. The Series is one-dimensional and homogeneous, containing only integer values.

By comparing the output, you can observe that the DataFrame represents a tabular structure with multiple columns, whereas the Series represents a single column of data with its associated index and data type.

**Q5.** What are some common functions you can use to manipulate data in a Pandas DataFrame? Can you give an example of when you might use one of these functions?

**Answer:**

Pandas provides a wide range of functions to manipulate data in a DataFrame. Some common functions you can use include:

1. **head() and tail()**: These functions allow you to view the first or last n rows of the DataFrame, respectively. They are useful for quickly inspecting the data or getting a preview of its structure.

2. **info()**: This function provides a summary of the DataFrame, including the number of non-null values, data types, and memory usage. It helps in understanding the overall data composition and identifying missing values or potential data type issues.

3. **describe()**: This function generates descriptive statistics for each numeric column in the DataFrame, such as count, mean, standard deviation, minimum, maximum, and quartiles. It gives a quick overview of the distribution and central tendencies of the numerical data.

4. **value_counts()**: This function counts the occurrences of unique values in a specific column. It is useful for categorical data analysis and understanding the frequency distribution of different categories.

5. **sort_values()**: This function sorts the DataFrame by one or more columns. It allows you to arrange the data in ascending or descending order based on specific criteria. Sorting is helpful for finding the highest or lowest values, identifying outliers, or ordering data for visualization.

6. **groupby()**: This function is used for grouping data based on one or more columns. It enables you to perform aggregate operations on specific groups of data, such as calculating sums, means, counts, or applying custom functions.

7. **fillna()**: This function fills missing or NaN (Not a Number) values in the DataFrame with specified values or using different filling methods. It helps in handling missing data and ensuring the completeness of the dataset.

8. **drop()**: This function allows you to remove rows or columns from the DataFrame based on specified conditions or labels. It helps in eliminating unnecessary data or filtering the DataFrame to focus on specific subsets.

Example:
Suppose you have a DataFrame containing customer data, including their age, gender, and purchase amount. You want to perform some analysis based on gender. Here's an example of using the `groupby()` function to calculate the average purchase amount by gender:

```python
import pandas as pd

# Creating a sample DataFrame
data = {
    'Name': ['John', 'Emily', 'Charlie', 'Alice'],
    'Age': [25, 30, 35, 28],
    'Gender': ['Male', 'Female', 'Male', 'Female'],
    'PurchaseAmount': [100, 200, 150, 180]
}

df = pd.DataFrame(data)

# Calculate average purchase amount by gender
average_purchase_by_gender = df.groupby('Gender')['PurchaseAmount'].mean()

print(average_purchase_by_gender)
```

Output:
```
Gender
Female    190.0
Male      125.0
Name: PurchaseAmount, dtype: float64
```

In this example, the `groupby()` function is used to group the data by the 'Gender' column. Then, the `mean()` function is applied to the 'PurchaseAmount' column within each group to calculate the average purchase amount by gender. The resulting Series displays the average purchase amount for each gender.

This is just one example of using the `groupby()` function, and there are numerous other functions and techniques available in Pandas to manipulate and analyze data in DataFrames.

**Q6.** Which of the following is mutable in nature Series, DataFrame, Panel?

**Answer:**

Among the options given (Series, DataFrame, Panel), both Series and DataFrame are mutable in nature, while Panel is not. 

In Pandas, a mutable object can be modified after it is created. Here's a breakdown of the mutability of each object:

1. **Series**: A Pandas Series is mutable, meaning its elements can be changed or modified. You can update individual values, add or remove elements, or perform other operations that modify the Series in-place.

2. **DataFrame**: Similarly, a Pandas DataFrame is mutable. You can modify its contents by updating values, adding or removing columns, or performing various operations that change the data structure.

3. **Panel**: A Pandas Panel is a three-dimensional data structure, which can be thought of as a container for multiple DataFrames. However, the Panel object itself is immutable, meaning its structure cannot be modified after creation. While you can access and modify individual DataFrames within a Panel, you cannot add or remove entire DataFrames or change the overall structure of the Panel.

It's worth noting that the Panel data structure has been deprecated in recent versions of Pandas, and it is recommended to use other data structures like multi-indexed DataFrames or xarray for handling multi-dimensional data.

**Q7.** Create a DataFrame using multiple Series. Explain with an example.

**Answer:**

To create a DataFrame using multiple Series, you can pass the Series objects as a dictionary to the pd.DataFrame() function. Each Series will correspond to a column in the resulting DataFrame. Here's an example:

In [3]:
import pandas as pd

# Creating individual Series
name_series = pd.Series(['John', 'Emily', 'Charlie'])
age_series = pd.Series([25, 30, 35])
country_series = pd.Series(['USA', 'Canada', 'UK'])

# Creating DataFrame using Series
data = {
    'Name': name_series,
    'Age': age_series,
    'Country': country_series
}

df = pd.DataFrame(data)

print(df)

      Name  Age Country
0     John   25     USA
1    Emily   30  Canada
2  Charlie   35      UK
