Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [1]:
import pandas as pd
data = [4,8,15,16,23,42]
series = pd.Series(data)
print(series)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


In [2]:
type(series)

pandas.core.series.Series

Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the
variable print it.

In [3]:
data = [10,20,30,40,50,60,70,80,90,100]
series = pd.Series(data)
print(series)

0     10
1     20
2     30
3     40
4     50
5     60
6     70
7     80
8     90
9    100
dtype: int64


Q3. Create a Pandas DataFrame that contains the following data:

Then, print the DataFrame.

In [4]:
data = {
    'Name' : ['Alice' , 'Bob' , 'Claire'],
    'Age' : [25 , 30 , 27],
    'Gender' : ['Female' , 'Male' , 'Female']
}
df = pd.DataFrame(data)
print(df)

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

In Pandas, a DataFrame is a two-dimensional labeled data structure, similar to a table in a relational database or an Excel spreadsheet. It consists of rows and columns, where each column can hold data of different types (e.g., numbers, strings, dates), and each row represents a single observation or record. DataFrames provide a convenient and efficient way to manipulate and analyze structured data.

On the other hand, a Pandas Series is a one-dimensional labeled array capable of holding data of any type. It can be thought of as a single column of a DataFrame. Each element in a Series has an associated label, referred to as an index, which allows for easy and efficient data access.

Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can
you give an example of when you might use one of these functions?

Pandas provides a wide range of functions to manipulate data within a DataFrame. Some common functions include:

1. **`head()` and `tail()`**: These functions allow you to view the first or last few rows of the DataFrame, respectively. They're useful for quickly inspecting the data.

   Example:
   ```python
   df_head = df.head()  # View the first 5 rows
   df_tail = df.tail(3)  # View the last 3 rows
   ```

2. **`describe()`**: This function generates descriptive statistics for each numerical column in the DataFrame, such as mean, standard deviation, minimum, maximum, and quartiles.

   Example:
   ```python
   df_stats = df.describe()  # Generate summary statistics
   ```

3. **`sort_values()`**: This function allows you to sort the DataFrame by one or more columns.

   Example:
   ```python
   df_sorted = df.sort_values(by='Age')  # Sort by the 'Age' column
   ```

4. **`groupby()`**: This function is used to group the data based on one or more columns, allowing you to perform aggregate operations on each group.

   Example:
   ```python
   grouped_df = df.groupby('Gender')['Age'].mean()  # Calculate average age by gender
   ```

5. **`filter()`**: This function allows you to filter rows based on specific conditions.

   Example:
   ```python
   filtered_df = df[df['Age'] > 25]  # Filter rows where age is greater than 25
   ```

6. **`drop()`**: This function lets you remove columns or rows from the DataFrame.

   Example:
   ```python
   df_removed_column = df.drop(columns='Math Grade')  # Remove the 'Math Grade' column
   ```

7. **`fillna()`**: This function allows you to fill missing values in the DataFrame with specified values or methods.

   Example:
   ```python
   df_filled = df.fillna(0)  # Fill missing values with 0
   ```

8. **`apply()`**: This function applies a function to each element in a column or row.

   Example:
   ```python
   def double_age(age):
       return age * 2
       
   df['Double Age'] = df['Age'].apply(double_age)  # Create a new column with doubled ages
   ```

9. **`pivot_table()`**: This function creates a pivot table from the DataFrame, allowing you to summarize and reshape the data.

   Example:
   ```python
   pivot_table = df.pivot_table(index='Gender', columns='Name', values='Age', aggfunc='mean')
   ```

These are just a few examples of the many functions available in Pandas for manipulating DataFrame data. The choice of function depends on the specific data manipulation task you need to perform.

Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

Among the options provided, both the `Series` and `DataFrame` are mutable in nature in Pandas, whereas the `Panel` has been deprecated in recent versions of Pandas and is not commonly used.

- **Series**: A Pandas Series is mutable, meaning you can modify its elements after it's created. You can change values, add new elements, and perform various operations on a Series.

- **DataFrame**: A Pandas DataFrame is also mutable. You can change values, add or remove columns, and perform various data manipulation operations on a DataFrame.

- **Panel**: The Panel was a three-dimensional data structure in earlier versions of Pandas, designed to hold data in a similar manner to a DataFrame but with an additional dimension. However, starting from Pandas version 0.25.0, the Panel has been deprecated due to its complexity and limited use cases. Instead, multi-index DataFrames or other data structures are recommended for handling more complex data scenarios.

Q7. Create a DataFrame using multiple Series. Explain with an example.

In [6]:
name_series = ['Alice' , 'Bob' , 'Claire']
age_series = [25 , 30 , 27]
gender_series = ['Female' , 'Male' , 'Female']

data = {
    'Name' : name_series,
    'Age' : age_series,
    'Gender' : gender_series
}

df = pd.DataFrame(data)
print(df)

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


In this example, we created three Series (name_series, age_series, and gender_series), each containing data for the 'Name', 'Age', and 'Gender' columns, respectively. Then, we combined these Series into a dictionary called data, where the keys are the column names and the values are the Series.

Finally, we used the pd.DataFrame() constructor to create a DataFrame named df from the data dictionary. The resulting DataFrame has three columns: 'Name', 'Age', and 'Gender', with the corresponding data from the Series.