# Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

# Ans :

In [1]:
import pandas as pd

data = [4, 8, 15, 16, 23, 42]
series = pd.Series(data)

print(series)


0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


# Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.

# Ans :

In [2]:
import pandas as pd

fruits_vegetables = ['apple', 'banana', 'carrot', 'spinach', 'orange', 'broccoli', 'grape', 'tomato', 'lettuce', 'pear']
series = pd.Series(fruits_vegetables)

print(series)


0       apple
1      banana
2      carrot
3     spinach
4      orange
5    broccoli
6       grape
7      tomato
8     lettuce
9        pear
dtype: object


# Q3. Create a Pandas DataFrame that contains the following data:
  Name == Age == Gender
  
  Alice == 25 == Female
  
   Bob == 30 == Male
    
  Claire == 27 == Female
  
Then, print the DataFrame.

# Ans :

In [4]:
import pandas as pd

data = {'Name': ['Rajib', 'Mohan', 'Ankita'],
        'Age': [25, 30, 27],
        'Gender': ['Male', 'Male', 'Female']}

df = pd.DataFrame(data)

print(df)


     Name  Age  Gender
0   Rajib   25  Female
1   Mohan   30    Male
2  Ankita   27  Female


# Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

# Ans :

In pandas, a DataFrame is a two-dimensional data structure that represents tabular data in the form of rows and columns. It can be seen as a collection of Series objects, where each Series represents a column in the DataFrame. A DataFrame provides a powerful and flexible way to store, manipulate, and analyze structured data.

Here are some key points differentiating a DataFrame from a pandas Series:

1. Dimensionality: A Series is a one-dimensional labeled array, whereas a DataFrame is a two-dimensional structure.

2. Structure: A Series represents a single column of data, while a DataFrame is a collection of columns where each column can be of a different data type. DataFrame allows for storing and working with multiple variables or features simultaneously.

3. Indexing: A Series has a single index that provides a label for each element in the Series. In contrast, a DataFrame has two indices: one for the rows (the row index) and one for the columns (the column index). The row index allows access to individual rows, while the column index allows access to specific columns or groups of columns.

4. Flexibility: While a Series is more suitable for working with one-dimensional data and performing operations on a single column, a DataFrame provides additional functionalities for handling tabular data. With a DataFrame, you can perform operations across columns, filter rows based on conditions, join or merge multiple DataFrames, pivot data, and more.


To illustrate the difference, consider the following example:

In [6]:
import pandas as pd

# Creating a Series
series = pd.Series([1, 2, 3, 4, 5])
print("Series:")
print(series)
print()

# Creating a DataFrame
data = {'Name': ['Rajib', 'Mohan', 'Ankita'],
        'Age': [25, 30, 27],
        'Gender': ['Male', 'Male', 'Female']}
df = pd.DataFrame(data)
print("DataFrame:")
print(df)


Series:
0    1
1    2
2    3
3    4
4    5
dtype: int64

DataFrame:
     Name  Age  Gender
0   Rajib   25    Male
1   Mohan   30    Male
2  Ankita   27  Female


# Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can you give an example of when you might use one of these functions?

# Ans :

Pandas provides a wide range of functions and methods to manipulate data in a DataFrame. Here are some common functions that are frequently used:

1. head(): This function allows you to view the first few rows of the DataFrame. It is useful for inspecting the data and understanding its structure.

2. tail(): This function is similar to head(), but it allows you to view the last few rows of the DataFrame.

3. info(): This function provides a summary of the DataFrame, including the column names, data types, and the number of non-null values in each column. It helps in understanding the data types and identifying missing values.

4. describe(): This function generates descriptive statistics of the numerical columns in the DataFrame, such as count, mean, standard deviation, minimum, maximum, and quartile values. It provides a quick overview of the distribution and summary statistics of the data.

5. shape: This attribute returns a tuple representing the dimensions of the DataFrame (number of rows, number of columns).

6. columns: This attribute returns a list of column names in the DataFrame.

7. values: This attribute returns a 2D NumPy array containing the underlying data in the DataFrame.

8. loc[] and iloc[]: These indexing methods allow you to access and manipulate specific rows and columns in the DataFrame using labels (loc[]) or integer-based indexing (iloc[]).

9. drop(): This function allows you to remove rows or columns from the DataFrame based on specified labels or indices.

10. fillna(): This function allows you to fill missing values in the DataFrame with a specified value or a calculated value, such as the mean or median.

11. groupby(): This function is used for grouping the data based on one or more columns. It enables performing aggregations, such as sum, mean, count, etc., on the grouped data.

12. sort_values(): This function allows you to sort the DataFrame based on one or more columns, either in ascending or descending order.

These are just a few examples of common functions used to manipulate data in a Pandas DataFrame. Pandas provides a comprehensive set of functions and methods that enable data cleaning, transformation, filtering, aggregation, and various other operations to efficiently work with and analyze tabular data.







In [7]:
import pandas as pd
import numpy as np

# Creating a sample DataFrame
data = {'Name': ['Ankita', 'Binodh', 'Chandana', 'Dipankar', 'Ela'],
        'Age': [25, np.nan, 27, 32, 28],
        'Gender': ['Female', 'Male', 'Female', 'Male', 'Female']}
df = pd.DataFrame(data)

# View the first few rows using head()
print("First few rows:")
print(df.head())
print()

# View the last few rows using tail()
print("Last few rows:")
print(df.tail())
print()

# Get summary information using info()
print("Summary information:")
print(df.info())
print()

# Generate descriptive statistics using describe()
print("Descriptive statistics:")
print(df.describe())
print()

# Get the dimensions of the DataFrame using shape
print("Dimensions:")
print(df.shape)
print()

# Get the column names using columns
print("Column names:")
print(df.columns)
print()

# Get the underlying data as a 2D array using values
print("Underlying data:")
print(df.values)
print()

# Access specific rows and columns using loc[]
print("Access specific rows and columns:")
print(df.loc[1:3, 'Name':'Age'])
print()

# Remove rows or columns using drop()
print("DataFrame after dropping 'Gender' column:")
df_dropped = df.drop('Gender', axis=1)
print(df_dropped)
print()

# Fill missing values using fillna()
print("DataFrame after filling missing values:")
df_filled = df.fillna({'Age': df['Age'].mean()})
print(df_filled)
print()

# Group the data and perform aggregation using groupby()
print("Grouped data:")
grouped_data = df.groupby('Gender')['Age'].mean()
print(grouped_data)
print()

# Sort the DataFrame based on 'Age' column using sort_values()
print("Sorted DataFrame:")
df_sorted = df.sort_values('Age', ascending=False)
print(df_sorted)


First few rows:
       Name   Age  Gender
0    Ankita  25.0  Female
1    Binodh   NaN    Male
2  Chandana  27.0  Female
3  Dipankar  32.0    Male
4       Ela  28.0  Female

Last few rows:
       Name   Age  Gender
0    Ankita  25.0  Female
1    Binodh   NaN    Male
2  Chandana  27.0  Female
3  Dipankar  32.0    Male
4       Ela  28.0  Female

Summary information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Name    5 non-null      object 
 1   Age     4 non-null      float64
 2   Gender  5 non-null      object 
dtypes: float64(1), object(2)
memory usage: 248.0+ bytes
None

Descriptive statistics:
            Age
count   4.00000
mean   28.00000
std     2.94392
min    25.00000
25%    26.50000
50%    27.50000
75%    29.00000
max    32.00000

Dimensions:
(5, 3)

Column names:
Index(['Name', 'Age', 'Gender'], dtype='object')

Underlying data:
[['Ankita' 25.0 

# Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

# Ans :

Among the options provided, both the Series and the DataFrame are mutable in nature, while the Panel is not mutable.

In pandas, a mutable object is one that can be modified after it is created. Here's a breakdown of the mutability of each object:

1. Series: Mutable

    * A pandas Series is mutable, meaning you can change its values, add or remove elements, or modify the index.For example, you can update the values of a Series using assignment or apply operations like addition, subtraction, etc.
    
2. DataFrame: Mutable

    * A pandas DataFrame is also mutable, allowing you to modify its values, add or remove columns, or perform various transformations.For example, you can update specific cells, add new columns, or apply functions to transform the existing data.

3. Panel: Immutable

    * A pandas Panel is an object that was used in earlier versions of pandas for handling three-dimensional data.However, the Panel data structure has been deprecated in recent versions of pandas (since version 0.20.0) in favor of more flexible alternatives like MultiIndex DataFrames.Panels were immutable in nature, meaning they couldn't be modified once created. They were intended for handling static data.

# Q7. Create a DataFrame using multiple Series. Explain with an example.

# Ans :

To create a DataFrame using multiple Series, you can combine the Series objects into a dictionary and then pass the dictionary as an argument to the pd.DataFrame() constructor. Each Series will correspond to a column in the resulting DataFrame.

Here's an example that demonstrates how to create a DataFrame using multiple Series:

In [10]:
import pandas as pd

# Creating Series objects
name_series = pd.Series(['Ravi', 'Priya', 'Aarav'])
age_series = pd.Series([32, 28, 25])
gender_series = pd.Series(['Male', 'Female', 'Male'])

# Printing the Series
print("Series:")
print(name_series)
print(age_series)
print(gender_series)

# Creating a DataFrame using multiple Series
data = {'Name': name_series,
        'Age': age_series,
        'Gender': gender_series}
df = pd.DataFrame(data)

# Printing the DataFrame
print("Combined into DataFeame:")
print(df)


Series:
0     Ravi
1    Priya
2    Aarav
dtype: object
0    32
1    28
2    25
dtype: int64
0      Male
1    Female
2      Male
dtype: object
Combined into DataFeame:
    Name  Age  Gender
0   Ravi   32    Male
1  Priya   28  Female
2  Aarav   25    Male
