# **Pandas Basic**

## Q1 **Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.**

In [5]:
import pandas as pd
data = [4,8,15,16,23,42]
df = pd.Series(data)
df

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64

## Q2 **Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.**

In [6]:
import pandas as pd

list_data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

series_from_list = pd.Series(list_data)

print(series_from_list)


0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64


## Q3 **Create a Pandas DataFrame that contains the following data:**
![image.png](attachment:image.png)

In [7]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}

df = pd.DataFrame(data)

print(df)


     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


## Q4 **What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.**

In Pandas, a `DataFrame` is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It can be thought of as a dictionary-like container for Series objects, where each Series represents a column of data.

On the other hand, a `Series` is a one-dimensional labeled array capable of holding any data type (integer, string, float, etc.). It is similar to a column in a DataFrame or a single list of data with an index.

### Differences between DataFrame and Series:

1. **Dimensionality**:
   - `Series`: One-dimensional array.
   - `DataFrame`: Two-dimensional array.

2. **Structure**:
   - `Series`: Consists of an index and values.
   - `DataFrame`: Consists of rows and columns, with both row and column labels.

3. **Data Type**:
   - `Series`: Can hold only one data type at a time.
   - `DataFrame`: Can hold multiple data types (different columns can be of different types).

### Example:

#### Creating a Series:
```python
import pandas as pd

# Creating a Series
series_data = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])
print(series_data)
```

Output:
```
a    10
b    20
c    30
d    40
e    50
dtype: int64
```

#### Creating a DataFrame:
```python
import pandas as pd

# Creating a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}
df = pd.DataFrame(data)
print(df)
```

Output:
```
     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female
```

In this example:
- The `Series` represents a single column of data with index labels.
- The `DataFrame` represents a table of data with multiple columns and rows, where each column can have a different data type. 

### Visual Representation:
- **Series**: One-dimensional
```
a    10
b    20
c    30
d    40
e    50
dtype: int64
```
- **DataFrame**: Two-dimensional
```
     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female
```

The key distinction is that while a Series is suitable for storing a single column of data, a DataFrame is used for storing a tabular dataset with multiple columns, allowing for more complex data manipulation and analysis.

## Q6 **What are some common functions you can use to manipulate data in a Pandas DataFrame? Can you give an example of when you might use one of these functions?**

Pandas provides a rich set of functions to manipulate data in a DataFrame. Here are some common functions and examples of their use:

1. **`head()` and `tail()`**:
   - **Usage**: View the first or last few rows of the DataFrame.
   - **Example**: `df.head(3)` displays the first 3 rows of the DataFrame.

2. **`info()`**:
   - **Usage**: Get a summary of the DataFrame including the data types and non-null counts.
   - **Example**: `df.info()` helps to understand the structure of the DataFrame.

3. **`describe()`**:
   - **Usage**: Generate descriptive statistics.
   - **Example**: `df.describe()` provides statistics like mean, median, standard deviation, etc., for numerical columns.

4. **`sort_values()`**:
   - **Usage**: Sort the DataFrame by the values of one or more columns.
   - **Example**: `df.sort_values(by='Age')` sorts the DataFrame by the 'Age' column.

5. **`groupby()`**:
   - **Usage**: Group the DataFrame using a column and perform aggregate functions.
   - **Example**: `df.groupby('Gender').mean()` calculates the mean values for each group in the 'Gender' column.

6. **`drop()`**:
   - **Usage**: Remove rows or columns by specifying label names and axis.
   - **Example**: `df.drop(columns=['Gender'])` removes the 'Gender' column from the DataFrame.

7. **`fillna()`**:
   - **Usage**: Fill NA/NaN values using a specified method or value.
   - **Example**: `df.fillna(0)` replaces all NaN values with 0.

8. **`apply()`**:
   - **Usage**: Apply a function along an axis of the DataFrame.
   - **Example**: `df['Age'].apply(lambda x: x + 1)` increments every value in the 'Age' column by 1.

9. **`merge()`**:
   - **Usage**: Merge DataFrame or named Series objects with a database-style join.
   - **Example**: `pd.merge(df1, df2, on='Name')` merges two DataFrames on the 'Name' column.



### Example Use Case

Suppose we have a DataFrame of student test scores and we want to find the average score for each class:

```python
import pandas as pd

# Sample data
data = {
    'Student': ['Alice', 'Bob', 'Claire', 'David', 'Eve'],
    'Class': ['Math', 'Math', 'Science', 'Science', 'Math'],
    'Score': [85, 90, 95, 80, 88]
}

df = pd.DataFrame(data)

# Using groupby() to find the average score for each class
average_scores = df.groupby('Class')['Score'].mean()
print(average_scores)
```

Output:
```
Class
Math       87.666667
Science    87.500000
Name: Score, dtype: float64
```

In this example, `groupby()` is used to group the data by 'Class' and then calculate the mean 'Score' for each class. This is useful in educational data analysis to evaluate the performance of different classes.

## Q6. **Which of the following is mutable in nature Series, DataFrame, Panel?**

In Pandas, both `Series` and `DataFrame` are mutable in nature. This means that you can modify the data contained within these structures after they have been created. You can change their values, add new rows or columns, and remove existing ones.

### Mutability in Pandas Structures:

- **Series**:
  - A `Series` is a one-dimensional array-like object that is mutable. You can change its values, add new elements, or remove existing ones.
  - Example:
    ```python
    import pandas as pd

    series = pd.Series([1, 2, 3])
    series[1] = 10  # Modifying an existing value
    series[3] = 4   # Adding a new value
    print(series)
    ```
    Output:
    ```
    0     1
    1    10
    2     3
    3     4
    dtype: int64
    ```

- **DataFrame**:
  - A `DataFrame` is a two-dimensional array-like object that is also mutable. You can modify its data, add or remove rows and columns.
  - Example:
    ```python
    import pandas as pd

    data = {
        'Name': ['Alice', 'Bob', 'Claire'],
        'Age': [25, 30, 27]
    }
    df = pd.DataFrame(data)
    df.loc[1, 'Age'] = 31  # Modifying an existing value
    df['Gender'] = ['Female', 'Male', 'Female']  # Adding a new column
    print(df)
    ```
    Output:
    ```
        Name  Age  Gender
    0   Alice   25  Female
    1     Bob   31    Male
    2  Claire   27  Female
    ```

- **Panel**:
  - `Panel` was a three-dimensional data structure in earlier versions of Pandas. However, it has been deprecated since Pandas version 0.25.0 (July 2019) and removed in later versions. For three-dimensional data, Pandas now recommends using a `MultiIndex` DataFrame or the `xarray` library.

Given the context and current versions of Pandas, the focus should be on `Series` and `DataFrame`, both of which are mutable. The concept of a `Panel` is no longer relevant in modern Pandas usage.

## Q7. **Create a DataFrame using multiple Series. Explain with an example.**

In [13]:
import pandas as pd

names = pd.Series(['Alice', 'Bob', 'Claire'])
ages = pd.Series([25, 30, 27])
genders = pd.Series(['Female', 'Male', 'Female'])

df = pd.DataFrame({
    'Name': names,
    'Age': ages,
    'Gender': genders
})

print(df)


     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


## **Complete**