Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

Ans--

Here's how you can create a Pandas Series containing the given data and print it:

In [1]:
import pandas as pd

data = [4, 8, 15, 16, 23, 42]
series = pd.Series(data)

print(series)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the
variable print it.

Ans--

Certainly! Here's how you can create a list variable containing 10 elements and then create a Pandas Series using the pd.Series() function and print it:

In [2]:
import pandas as pd

# Create a list with 10 elements
my_list = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

# Create a Pandas Series from the list
series = pd.Series(my_list)

# Print the series
print(series)

0     10
1     20
2     30
3     40
4     50
5     60
6     70
7     80
8     90
9    100
dtype: int64


Q3. Create a Pandas DataFrame that contains the following data:

Then, print the DataFrame.

Ans--

Here's how you can create a Pandas DataFrame containing the given data and print it:

In [3]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}

df = pd.DataFrame(data)

print(df)

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


The DataFrame displays the data with column headers ('Name', 'Age', 'Gender') and the corresponding values in rows.

Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

Ans--

In Pandas, a DataFrame is a two-dimensional labeled data structure that is used to store and manipulate tabular data. It is similar to a spreadsheet or a SQL table, where data is organized in rows and columns. Each column in a DataFrame can be of a different data type (e.g., integers, strings, floats), making it a versatile data structure for various data analysis tasks.

On the other hand, a Pandas Series is a one-dimensional labeled array that can hold data of any type. It's often thought of as a single column of a DataFrame. Each element in a Series is associated with an index label, which provides a way to access and manipulate the data.

Here's a more detailed explanation with an example:

Example using a Series:

Let's say we have a dataset containing the heights of three individuals:

In [4]:
import pandas as pd

heights = pd.Series([165, 180, 155], name='Height')
print(heights)

0    165
1    180
2    155
Name: Height, dtype: int64


In this case, heights is a Pandas Series. It's a one-dimensional data structure that stores the heights of individuals. The index labels (0, 1, 2) are automatically assigned, and the Series has a single column of data (Height).

Example using a DataFrame:

Now, let's expand the dataset to include additional information about the individuals:

In [5]:
data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Height': [165, 180, 155],
    'Age': [25, 30, 27]
}

df = pd.DataFrame(data)
print(df)

     Name  Height  Age
0   Alice     165   25
1     Bob     180   30
2  Claire     155   27


In this example, df is a Pandas DataFrame. It's a two-dimensional data structure that stores the names, heights, and ages of individuals. Each column in the DataFrame (Name, Height, Age) represents a different variable, and each row represents an individual's information.

Key Differences:

1. Dimensions: A Series is one-dimensional, while a DataFrame is two-dimensional.
2. Columns: A DataFrame can have multiple columns of different data types, whereas a Series has only one column.
3. Indexing: Both Series and DataFrames have index labels for rows. In a Series, index labels are used to access individual elements. In a DataFrame, index labels are used to access rows or subsets of the data.
4. Use Cases: Series are often used to represent a single variable, while DataFrames are used to store and manipulate tabular data that involves multiple variables.
5. Syntax: Accessing a column in a DataFrame is similar to accessing a Series, but in a DataFrame, you access a column by using df['Column'], whereas in a Series, you use series['Index'].

In summary, a Series is a fundamental building block in Pandas that represents a one-dimensional array with an index. A DataFrame is a more complex structure that organizes multiple Series-like objects into a tabular format, similar to a table in a relational database or a spreadsheet.

Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can
you give an example of when you might use one of these functions?

Ans--

Pandas provides a wide range of functions for manipulating data in a DataFrame. Here are some common functions along with examples of when you might use them:

head() and tail(): These functions allow you to view the first few rows (head) or last few rows (tail) of a DataFrame.

In [6]:
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Claire'],
        'Age': [25, 30, 27],
        'Gender': ['Female', 'Male', 'Female']}

df = pd.DataFrame(data)

print(df.head())  # Display the first 5 rows
print(df.tail(2))  # Display the last 2 rows

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female
     Name  Age  Gender
1     Bob   30    Male
2  Claire   27  Female


2. describe(): Provides summary statistics of the numeric columns in the DataFrame.

In [None]:
print(df.describe())

3. info(): Displays concise information about the DataFrame, including data types and non-null counts.

In [None]:
print(df.info())

4. sort_values(): Sorts the DataFrame based on one or more columns.

In [None]:
sorted_df = df.sort_values(by='Age', ascending=False)
print(sorted_df)

5. groupby(): Groups data based on one or more columns and allows applying aggregate functions.

In [None]:
grouped = df.groupby('Gender')['Age'].mean()
print(grouped)

6. drop(): Removes specified columns or rows from the DataFrame.

In [None]:
df_without_age = df.drop(columns='Age')
print(df_without_age)

7. fillna(): Fills missing values in the DataFrame with specified values or methods.

In [None]:
df_filled = df.fillna(value={'Age': 0})
print(df_filled)

8. apply(): Applies a function to each element or column of the DataFrame.

In [7]:
def age_category(age):
    if age < 30:
        return 'Young'
    else:
        return 'Adult'

df['Age_Category'] = df['Age'].apply(age_category)
print(df)

     Name  Age  Gender Age_Category
0   Alice   25  Female        Young
1     Bob   30    Male        Adult
2  Claire   27  Female        Young


9. pivot_table(): Creates a pivot table from the DataFrame, allowing you to summarize and aggregate data.

In [None]:
pivot_table = df.pivot_table(values='Age', index='Gender', columns='Age_Category', aggfunc='mean')
print(pivot_table)

10. merge(): Combines two DataFrames based on a common column (similar to SQL JOIN).

In [None]:
other_data = {'Name': ['Alice', 'Bob', 'Eve'],
              'Country': ['USA', 'Canada', 'UK']}

other_df = pd.DataFrame(other_data)
merged_df = df.merge(other_df, on='Name')
print(merged_df)

These are just a few examples of the many functions available in Pandas for data manipulation. Depending on your specific analysis or data cleaning needs, you can choose the appropriate functions to transform and manipulate your DataFrame.

Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

Ans--

Among the options provided, both Series and DataFrame are mutable in nature in the context of Pandas.

- A Series is mutable because you can modify its values, add new elements, or change existing ones after it has been created.

- A DataFrame is also mutable. You can modify its data, add or remove columns, and perform various data manipulation operations on it.

On the other hand, the Panel data structure has been deprecated in recent versions of Pandas and is no longer recommended for use. Instead, multi-dimensional data is usually handled using a combination of DataFrames and multi-indexing. Therefore, Panels are not a relevant consideration for most current Pandas use cases.

Q7. Create a DataFrame using multiple Series. Explain with an example.

Ans--

You can create a DataFrame using multiple Series by passing the Series as a dictionary to the pd.DataFrame() constructor. Each Series will become a column in the resulting DataFrame. Here's an example: