Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [3]:
import pandas as pd
l1 = [4,5,15,16,23,42]
s = pd.Series(l1)
print(s)

0     4
1     5
2    15
3    16
4    23
5    42
dtype: int64


Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the
variable print it.

In [4]:
import pandas as pd

my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
my_series = pd.Series(my_list)

print(my_series)


0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64


Q3. Create a Pandas DataFrame that contains the following data:

Name
Alice
Bob
Claire
Then, print the DataFrame.
Age 25
30
27
Gender Female Male
Female


In [5]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}

df = pd.DataFrame(data)

print(df)


     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

In [None]:
In Pandas, both `DataFrame` and `Series` are fundamental data structures, but they serve different purposes:

1. **DataFrame**:
   - A DataFrame is a 2-dimensional, tabular data structure. It can be thought of as a collection of Series objects, where each column is a Series.
   - It is similar to a spreadsheet or a SQL table and is used to store and manipulate structured data with rows and columns.
   - DataFrames are suitable for representing data in a structured way, where you often have multiple attributes or features associated with each row of data.

   Example:

   ```python
   import pandas as pd

   data = {'Name': ['Alice', 'Bob', 'Claire'],
           'Age': [25, 30, 27],
           'Gender': ['Female', 'Male', 'Female']}

   df = pd.DataFrame(data)
   ```

   This creates a DataFrame with columns 'Name', 'Age', and 'Gender', where each column is a Series containing data of a specific type.

2. **Series**:
   - A Series is a 1-dimensional data structure in Pandas, similar to a column in a DataFrame.
   - It is essentially an array with data labels (an index) for each element. Each element in a Series has a unique label.
   - Series are useful when you need to work with 1-dimensional data or when you want to extract a single column from a DataFrame.

   Example:

   ```python
   import pandas as pd

   ages = pd.Series([25, 30, 27])
   ```

   This creates a Series named 'ages' with the values 25, 30, and 27. The Series has a default integer index starting from 0.

In summary, a DataFrame is a 2-dimensional data structure used to store and manipulate structured data with multiple attributes, while a Series is a 1-dimensional data structure used to store a single column of data. DataFrames are often used for organizing and analyzing datasets with multiple variables, while Series are used when you need to work with or extract a single attribute or feature from a DataFrame.

Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can  you give an example of when you might use one of these functions?

In [None]:
Pandas provides a wide range of functions for data manipulation within a DataFrame. Here are some common functions and examples of when you might use them:

1. **`head()` and `tail()`:**
   - Use `head(n)` to view the first n rows of the DataFrame, and `tail(n)` to view the last n rows.
   - Helpful for a quick look at the data's structure and content.

   ```python
   df.head(3)  # Show the first 3 rows of the DataFrame.
   ```

2. **`describe()`:**
   - Generates descriptive statistics, such as count, mean, standard deviation, and quartiles, for numerical columns.
   - Useful for understanding the distribution of your data.

   ```python
   df.describe()
   ```

3. **`info()`:**
   - Provides a concise summary of the DataFrame, including data types, non-null values, and memory usage.
   - Useful for data profiling and identifying missing values.

   ```python
   df.info()
   ```

4. **`groupby()`:**
   - Allows you to group data based on one or more columns and perform operations like aggregation, transformation, or filtering.
   - Useful for analyzing data by categories or groups.

   ```python
   df.groupby('Category')['Price'].mean()  # Calculate the average price per category.
   ```

5. **`sort_values()`:**
   - Sorts the DataFrame by one or more columns in ascending or descending order.
   - Useful for arranging data for analysis or presentation.

   ```python
   df.sort_values(by='Age', ascending=False)  # Sort the DataFrame by age in descending order.
   ```

6. **`filter()`:**
   - Allows you to filter rows based on a condition, creating a new DataFrame with the filtered results.
   - Useful for data subsetting and creating subsets of your data.

   ```python
   adults = df[df['Age'] >= 18]  # Create a DataFrame with rows where age is 18 or greater.
   ```

7. **`pivot_table()`:**
   - Creates a pivot table to summarize and reshape data, often used for cross-tabulation and summarization.
   - Useful for summarizing data by multiple dimensions.

   ```python
   pivot = df.pivot_table(index='Gender', columns='Category', values='Price', aggfunc='mean')
   ```

8. **`drop()`:**
   - Removes specified columns or rows from the DataFrame.
   - Useful for data cleaning and removing unnecessary information.

   ```python
   df.drop(columns=['Column1', 'Column2'], inplace=True)  # Remove specified columns.
   ```

These functions, among others, provide powerful tools for data manipulation and analysis in Pandas DataFrames, making it a versatile library for working with structured data.

Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

In [None]:
In Pandas, among the three data structures you mentioned (Series, DataFrame, and Panel), only the DataFrame is mutable in nature. You can modify the contents, add or remove columns, and perform various data manipulation operations on a DataFrame.

- **Series:** A Series is not mutable. Once created, you cannot change its values or length. You can create a new Series with modified data if needed.

- **DataFrame:** A DataFrame is mutable. You can modify its contents, add or remove rows and columns, or perform various data manipulation operations. DataFrame is designed to handle 2-dimensional data and is the most versatile and commonly used data structure in Pandas.

- **Panel:** Panels were part of earlier versions of Pandas but have been removed in more recent versions. Data manipulation in 3D structures (Panel) is less common, and you would typically work with DataFrames instead.

Q7. Create a DataFrame using multiple Series. Explain with an example.

In [6]:
import pandas as pd

# Creating two Series
names = pd.Series(['Alice', 'Bob', 'Claire'])
ages = pd.Series([25, 30, 27])

# Creating a DataFrame using the Series
data = {'Name': names, 'Age': ages}
df = pd.DataFrame(data)

# Printing the DataFrame
print(df)


     Name  Age
0   Alice   25
1     Bob   30
2  Claire   27
