# Assignment_13 (pandas)

## Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [1]:
# here's how you can create a Pandas Series with the given data and print it:

import pandas as pd

# Creating the Pandas Series
data = [4, 8, 15, 16, 23, 42]
series = pd.Series(data)

# Printing the series
print(series)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


## Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.

In [2]:
import pandas as pd

# Creating a list with 10 elements
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Converting the list to a Pandas Series
my_series = pd.Series(my_list)

# Printing the series
print(my_series)


0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64


Q3. Create a Pandas DataFrame that contains the following data:
    
|  Name     |  Age  |  Gender  |
|-----------|-------|----------|
|   Alice   |  25   |  Female
|   Bob     |  30   |  Male
|   Claire  |  27   |  Female




In [3]:
data = {
  "Name": ["Alice","Bob","Claire"],
  "Age": [25, 30, 27],
    "Gender": ["Female", "Male", "Female"]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)

print(df) 


     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


## Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

### Pandas Series vs DataFrame

In Pandas, both `Series` and `DataFrame` are fundamental data structures. Here’s a detailed comparison:

### Pandas Series
- A `Series` is a one-dimensional labeled array capable of holding any data type (integers, strings, floating-point numbers, Python objects, etc.).
- Think of a Series as a single column of data.

### Pandas DataFrame
- A `DataFrame` is a two-dimensional labeled data structure with columns of potentially different types. 
- It is like a table or a spreadsheet in Excel, where data is aligned in rows and columns.

### Example to Illustrate the Difference

#### Pandas Series

```python
import pandas as pd

# Creating a Pandas Series
data = [4, 8, 15, 16, 23, 42]
series = pd.Series(data)

# Printing the Series
print("Pandas Series:")
print(series)
```

Output:
```
Pandas Series:
0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64
```

#### Pandas DataFrame

```python
import pandas as pd

# Creating a Pandas DataFrame
data = {
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': ['a', 'b', 'c', 'd', 'e']
}
df = pd.DataFrame(data)

# Printing the DataFrame
print("\nPandas DataFrame:")
print(df)
```

Output:
```
Pandas DataFrame:
   A   B  C
0  1  10  a
1  2  20  b
2  3  30  c
3  4  40  d
4  5  50  e
```

### Key Differences
1. **Dimensionality:**
   - A `Series` is one-dimensional.
   - A `DataFrame` is two-dimensional.

2. **Data Structure:**
   - A `Series` can be seen as a single column of data.
   - A `DataFrame` is like a table with multiple columns.

3. **Usage:**
   - Use `Series` when you need a single column of data.
   - Use `DataFrame` when you need a table with multiple rows and columns.

4. **Flexibility:**
   - In a `Series`, all data is of the same type.
   - In a `DataFrame`, different columns can have different data types.

### Summary

- **Series**: One-dimensional array with labels.
- **DataFrame**: Two-dimensional table with rows and columns, where columns can contain different types of data.

This explanation and examples should help you understand the difference between a Pandas `Series` and a `DataFrame`.

## Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can you give an example of when you might use one of these functions?

Pandas provides a wide range of functions to manipulate data in a `DataFrame`. Here are some common functions and their uses:

### Common Functions

1. **`head()` and `tail()`**: View the first or last n rows of the DataFrame.
   - Example: `df.head(10)` shows the first 10 rows.
   
2. **`info()`**: Get a concise summary of the DataFrame.
   - Example: `df.info()` provides information about the DataFrame including the index dtype and columns, non-null values, and memory usage.

3. **`describe()`**: Generate descriptive statistics.
   - Example: `df.describe()` gives count, mean, std, min, 25%, 50%, 75%, and max for each numerical column.

4. **`sort_values()`**: Sort the DataFrame by the values of one or more columns.
   - Example: `df.sort_values(by='column_name')` sorts the DataFrame by the specified column.

5. **`drop()`**: Remove rows or columns by specifying label names and axis.
   - Example: `df.drop(columns=['column_name'])` removes the specified column.

6. **`iloc[]` and `loc[]`**: Access a group of rows and columns by labels or a boolean array.
   - Example: `df.iloc[0:5]` selects the first 5 rows.
   - Example: `df.loc[df['column_name'] > 10]` selects rows where the column value is greater than 10.

7. **`groupby()`**: Group the DataFrame using a mapper or by a Series of columns.
   - Example: `df.groupby('column_name').mean()` groups by the specified column and calculates the mean for each group.

8. **`merge()`**: Merge DataFrames using a database-style join.
   - Example: `pd.merge(df1, df2, on='key_column')` merges `df1` and `df2` on the specified key column.

9. **`apply()`**: Apply a function along an axis of the DataFrame.
   - Example: `df.apply(np.sum, axis=0)` applies the sum function to each column.

10. **`pivot_table()`**: Create a spreadsheet-style pivot table as a DataFrame.
    - Example: `df.pivot_table(index='column1', columns='column2', values='column3', aggfunc='sum')` creates a pivot table.

### Example: Using `groupby()` and `mean()`

Let's say we have a DataFrame of sales data and we want to calculate the average sales for each product.

```python
import pandas as pd

# Sample sales data
data = {
    'Product': ['A', 'A', 'B', 'B', 'C', 'C'],
    'Sales': [100, 150, 200, 250, 300, 350],
    'Region': ['North', 'South', 'North', 'South', 'North', 'South']
}

df = pd.DataFrame(data)

# Calculate average sales for each product
average_sales = df.groupby('Product')['Sales'].mean()

# Print the result
print(average_sales)
```

Output:
```
Product
A    125.0
B    225.0
C    325.0
Name: Sales, dtype: float64
```

In this example, the `groupby()` function is used to group the data by the 'Product' column, and the `mean()` function calculates the average sales for each product. This is a common operation in data analysis to summarize and aggregate data based on specific criteria.

## Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

In Pandas, both `Series` and `DataFrame` are mutable, while `Panel` is deprecated in newer versions of Pandas (since version 0.25.0).

### Mutable Data Structures in Pandas

1. **Series**: 
   - A `Series` is mutable. You can change its values, add or remove elements, and modify its index.
   - Example:
     ```python
     import pandas as pd

     # Creating a Series
     s = pd.Series([1, 2, 3, 4, 5])

     # Modifying a value
     s[0] = 10
     print(s)
     ```

2. **DataFrame**: 
   - A `DataFrame` is also mutable. You can add, modify, or remove columns and rows.
   - Example:
     ```python
     import pandas as pd

     # Creating a DataFrame
     df = pd.DataFrame({
         'A': [1, 2, 3],
         'B': [4, 5, 6]
     })

     # Modifying a value
     df.at[0, 'A'] = 10
     print(df)
     ```

### Deprecated: Panel

- **Panel**: 
  - The `Panel` data structure was three-dimensional and mutable, but it has been deprecated since Pandas version 0.25.0. It is recommended to use `MultiIndex` DataFrames or the `xarray` library for multi-dimensional data instead.

Therefore, both `Series` and `DataFrame` are mutable in nature. The `Panel` is no longer in use in current Pandas versions.

You can create a DataFrame using multiple Series by combining them into a dictionary where the keys represent the column names and the values are the Series. Here’s an example:

### Example

Let's create three Series and then combine them into a DataFrame.

```python
import pandas as pd

# Creating Series
series1 = pd.Series([1, 2, 3], name='Column1')
series2 = pd.Series([4, 5, 6], name='Column2')
series3 = pd.Series([7, 8, 9], name='Column3')

# Combining Series into a DataFrame
df = pd.DataFrame({
    'Column1': series1,
    'Column2': series2,
    'Column3': series3
})

# Printing the DataFrame
print(df)
```

Output:
```
   Column1  Column2  Column3
0        1        4        7
1        2        5        8
2        3        6        9
```

### Explanation

1. **Creating Series**: 
   - We create three Series (`series1`, `series2`, `series3`) with names 'Column1', 'Column2', and 'Column3'. Each Series has three elements.

2. **Combining Series into a DataFrame**:
   - We create a dictionary where the keys are the column names ('Column1', 'Column2', 'Column3') and the values are the corresponding Series.
   - We pass this dictionary to the `pd.DataFrame` constructor to create the DataFrame.

3. **Printing the DataFrame**:
   - The resulting DataFrame has three columns ('Column1', 'Column2', 'Column3') with the values from the Series.

This method allows you to easily combine multiple Series into a structured DataFrame, which can be useful for data manipulation and analysis.