# Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [1]:
import pandas as pd

# create a list of data
data = [4, 8, 15, 16, 23, 42]

# create a pandas Series object from the data
series = pd.Series(data)

# print the Series object
print(series)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


# Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.

In [2]:
import pandas as pd
# create a list of elements
elements = [14,24,34,44,54,64,74,84,94,4]
series = pd.Series(elements)
print(series)

0    14
1    24
2    34
3    44
4    54
5    64
6    74
7    84
8    94
9     4
dtype: int64


![image.png](attachment:160f049a-9b88-43da-adc4-49c9c96f5b94.png)

In [3]:
import pandas as pd

# create a dictionary of data
data = {'Name': ['Alic', 'Bob', 'Claire'],
        'Age': [25, 30, 27],
        'Gender': ['Female', 'Male', 'Female']
        }

# create a pandas DataFrame object from the data
df = pd.DataFrame(data)

# print the DataFrame object
print(df)

     Name  Age  Gender
0    Alic   25  Female
1     Bob   30    Male
2  Claire   27  Female


# Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

A4. In pandas, a DataFrame is a two-dimensional table-like data structure that contains rows and columns, similar to a spreadsheet or SQL table. It is one of the most commonly used data structures in pandas, and is designed to handle a wide range of data formats and operations. Each column in a DataFrame is a pandas Series object, which is a one-dimensional array-like object that contains a sequence of values and an index.

The main difference between a pandas Series and a DataFrame is that a Series represents a single column of data, whereas a DataFrame represents a collection of columns (i.e. multiple Series) that can be used together to perform complex data operations. In other words, a Series is a one-dimensional array, while a DataFrame is a two-dimensional array.

Here's an example to illustrate the difference between a pandas Series and a DataFrame:

In [4]:
import pandas as pd

# create a dictionary of data
data = {'Name': ['John', 'Anna', 'Peter', 'Linda', 'Sophie'],
        'Age': [28, 25, 33, 42, 19],
        'Country': ['USA', 'UK', 'Germany', 'France', 'Canada'],
        'Salary': [60000, 45000, 80000, 70000, 55000]}

# create a pandas DataFrame object from the data
df = pd.DataFrame(data)

# create a pandas Series object from the 'Age' column of the DataFrame
age_series = df['Age']

# print the DataFrame object and the Series object
print('DataFrame:')
print(df)
print('\nSeries:')
print(age_series)


DataFrame:
     Name  Age  Country  Salary
0    John   28      USA   60000
1    Anna   25       UK   45000
2   Peter   33  Germany   80000
3   Linda   42   France   70000
4  Sophie   19   Canada   55000

Series:
0    28
1    25
2    33
3    42
4    19
Name: Age, dtype: int64


In this example, we created a pandas DataFrame object from a dictionary of data, which contains four columns ('Name', 'Age', 'Country', and 'Salary') and five rows of data. We then extracted a single column ('Age') from the DataFrame using the indexing operator ([]), which returned a pandas Series object that contains the values from the 'Age' column. Finally, we printed both the DataFrame and the Series objects.

As you can see from the output, the DataFrame is a two-dimensional table-like structure that contains all of the data from the original dictionary, while the Series is a one-dimensional array-like object that contains only the data from a single column of the DataFrame. While both objects share many of the same methods and attributes, they are different data structures and are used in different ways depending on the task at hand.

# Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can you give an example of when you might use one of these functions?

A5. Pandas provides a wide range of functions that can be used to manipulate data in a DataFrame. Some of the common functions include:

head() and tail(): These functions allow you to view the first or last n rows of a DataFrame. For example, df.head(10) will return the first 10 rows of the DataFrame df.

describe(): This function provides a summary of the descriptive statistics of a DataFrame. For example, df.describe() will return the count, mean, standard deviation, minimum, and maximum values for each numeric column in the DataFrame df.

dropna(): This function allows you to remove rows or columns with missing values (i.e. NaN values) from a DataFrame. For example, df.dropna() will return a new DataFrame with all rows containing missing values removed.

fillna(): This function allows you to fill missing values in a DataFrame with a specified value or method. For example, df.fillna(0) will fill all missing values in the DataFrame df with the value 0.

groupby(): This function allows you to group a DataFrame by one or more columns and apply a function to each group. For example, df.groupby('Country')['Salary'].mean() will group the DataFrame df by the 'Country' column and calculate the mean salary for each group.

sort_values(): This function allows you to sort a DataFrame by one or more columns in ascending or descending order. For example, df.sort_values(by='Salary', ascending=False) will sort the DataFrame df by the 'Salary' column in descending order.

apply(): This function allows you to apply a function to each element or row/column of a DataFrame. For example, df['Name'].apply(lambda x: x.upper()) will apply the upper() method to each element in the 'Name' column of the DataFrame df.

One example of when you might use one of these functions is when you have a DataFrame with missing values, and you want to remove or fill those missing values before performing any analysis or modeling. You can use the dropna() or fillna() functions to achieve this. For example, if you have a DataFrame df with missing values in the 'Salary' column, you can use the fillna() function to fill those missing values with the mean salary of the non-missing values:

In [5]:
import pandas as pd

# create a DataFrame with missing values
data = {'Name': ['John', 'Anna', 'Peter', 'Linda', 'Sophie'],
        'Age': [28, 25, 33, 42, 19],
        'Country': ['USA', 'UK', 'Germany', 'France', 'Canada'],
        'Salary': [60000, 45000, None, 70000, None]}
df = pd.DataFrame(data)

# fill missing values with the mean salary
mean_salary = df['Salary'].mean()
df['Salary'] = df['Salary'].fillna(mean_salary)

# print the updated DataFrame
print(df)


     Name  Age  Country        Salary
0    John   28      USA  60000.000000
1    Anna   25       UK  45000.000000
2   Peter   33  Germany  58333.333333
3   Linda   42   France  70000.000000
4  Sophie   19   Canada  58333.333333


In this example, we used the fillna() function to fill the missing values in the 'Salary' column of the DataFrame df with the mean salary of the non-missing values. The resulting DataFrame has no missing values in the 'Salary' column, which can now be used for further analysis or modeling.

Another example of when we might use one of these functions is when you want to group a DataFrame by one or more columns and perform some analysis on each group. You can use the groupby() function to achieve this. For example, if you have a DataFrame df with information about employees in different countries, and you want to calculate the average salary for each country:

In [6]:
import pandas as pd

# create a DataFrame with employee data
data = {'Name': ['John', 'Anna', 'Peter', 'Linda', 'Sophie'],
        'Age': [28, 25, 33, 42, 19],
        'Country': ['USA', 'UK', 'Germany', 'France', 'Canada'],
        'Salary': [60000, 45000, 55000, 70000, 50000]}
df = pd.DataFrame(data)

# calculate the average salary for each country
avg_salary_by_country = df.groupby('Country')['Salary'].mean()

# print the result
print(avg_salary_by_country)


Country
Canada     50000.0
France     70000.0
Germany    55000.0
UK         45000.0
USA        60000.0
Name: Salary, dtype: float64


In this example, we used the groupby() function to group the DataFrame df by the 'Country' column, and calculated the average salary for each group using the mean() method. The resulting avg_salary_by_country variable is a Series object that contains the average salary for each country, which can be used for further analysis or visualization.

# Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

A6. Both Series and DataFrame are mutable in nature in Pandas. This means that you can modify the contents of a Series or DataFrame object after it has been created by updating, adding, or removing elements.

For example, you can update the value of a specific element in a Series or DataFrame using indexing, or add a new column to a DataFrame using the assign() method.

On the other hand, Panel objects are not very commonly used in Pandas, and are not considered mutable in the latest version of Pandas. They were deprecated in Pandas 0.25.0 and were removed from the API in Pandas 1.0.0, as their functionality can be achieved using other Pandas objects such as MultiIndex DataFrames.