# 1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [2]:
import pandas as pd
data = [4, 8, 15, 16, 23, 42]
series = pd.Series(data)
series

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64

## 2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.

In [4]:
import random 
variable = [random.randint(1,33) for _ in range(10)]
series = pd.Series(variable)
series

0    29
1    11
2    25
3    14
4    28
5     3
6    25
7    18
8    25
9    23
dtype: int64

## 3. Create a Pandas DataFrame that contains the following data:
![image.png](attachment:355761a7-d52a-425c-8bef-dc497ca5dc99.png)
## Then, print the DataFrame.

In [10]:
data = {
    'Name':['Alice', 'Bob', 'Claire'],
    'Age': [25,30,27],
    'Gender': ['Female','Male','Female']
}

pd.DataFrame(data)

Unnamed: 0,Name,Age,Gender
0,Alice,25,Female
1,Bob,30,Male
2,Claire,27,Female


## 4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

### DataFrame:

- A DataFrame is a 2-dimensional labeled data structure, somewhat like a spreadsheet or SQL table. It consists of rows and columns.
- Each column in a DataFrame is a Pandas Series.
- A DataFrame is a collection of Series that share a common index.
- It is suitable for handling structured data where you have multiple variables or attributes associated with each observation.
### Series:

- A Series is a 1-dimensional labeled array that can hold data of any type (integers, strings, floats, etc.).
- It is essentially a single column or a single variable.
- Series objects have their own indexes that allow for fast and efficient data retrieval.
- Series are used for handling single columns or extracting a single variable from a DataFrame.

In [11]:
import pandas as pd

# Creating a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}

df = pd.DataFrame(data)

# Creating a Series
ages = pd.Series([25, 30, 27])

# Printing the DataFrame and Series
print("DataFrame:")
print(df)
print("\nSeries:")
print(ages)


DataFrame:
     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female

Series:
0    25
1    30
2    27
dtype: int64


## 5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can you give an example of when you might use one of these functions?


| Function          | Description                                           | Example                                           |
|-------------------|-------------------------------------------------------|---------------------------------------------------|
| `head()`          | View the first N rows of a DataFrame.                | `df.head()`                                       |
| `tail()`          | View the last N rows of a DataFrame.                 | `df.tail()`                                       |
| `info()`          | Display summary information about the DataFrame.     | `df.info()`                                       |
| `describe()`      | Generate summary statistics for numerical columns.   | `df.describe()`                                   |
| `value_counts()`  | Count unique values in a column.                     | `df['Column'].value_counts()`                    |
| `groupby()`       | Group data by one or more columns for aggregation.   | `df.groupby('Column')['Value'].agg_func()`       |
| `sort_values()`   | Sort the DataFrame by one or more columns.           | `df.sort_values(by='Column', ascending=True)`     |
| `filter()`        | Filter rows based on a condition.                   | `df[df['Condition']]`                            |
| `pivot_table()`   | Create a pivot table for summarization.              | `df.pivot_table(index='Index', values='Value')`  |
| `fillna()`        | Fill missing values in the DataFrame.               | `df['Column'].fillna(value, inplace=True)`       |
| `drop()`          | Remove columns or rows from the DataFrame.          | `df.drop(columns='Column', inplace=True)`        |



In [13]:
# Example: Grouping data by 'Gender' and calculating the mean age for each group
df.groupby('Gender')['Age'].mean()


Gender
Female    26.0
Male      30.0
Name: Age, dtype: float64

In [14]:
# Example: Creating a pivot table to calculate the mean age by 'Gender'
df.pivot_table(index='Gender', values='Age', aggfunc='mean')


Unnamed: 0_level_0,Age
Gender,Unnamed: 1_level_1
Female,26
Male,30


## 6. Which of the following is mutable in nature Series, DataFrame, Panel?

Series, DataFrame both are mutable

Update in September 2021, Pandas has deprecated the `Panel` data structure. The `Panel` was a 3D data structure in Pandas, similar to a `DataFrame` but with three dimensions. Due to its limited use and the complexity it introduced, it was deprecated in favor of using `DataFrame` and `MultiIndex` for handling multi-dimensional data.

## 7. Create a DataFrame using multiple Series. Explain with an example.

In [15]:
# Create multiple Series
name_series = pd.Series(['Alice', 'Bob', 'Claire'])
age_series = pd.Series([25, 30, 27])
gender_series = pd.Series(['Female', 'Male', 'Female'])

# Combine the Series into a DataFrame
data = {'Name': name_series, 'Age': age_series, 'Gender': gender_series}

In [16]:
pd.DataFrame(data)

Unnamed: 0,Name,Age,Gender
0,Alice,25,Female
1,Bob,30,Male
2,Claire,27,Female
