# Pandas Basic Assignment

### Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [5]:
import pandas as pd

# Create the Pandas Series
series = pd.Series([4, 8, 15, 16, 23, 42])

# Print the Series
print(series)

# Print the data type of the Series
print(type(series))

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64
<class 'pandas.core.series.Series'>


### Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.

In [7]:
import pandas as pd

# Create a list with 10 elements
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Convert the list into a pandas Series
series = pd.Series(my_list)

# Print the pandas Series
print(series)


0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64


### Q3. Create a Pandas DataFrame that contains the following data:
#### Name, Age, Gender
#### Alice, 25, Female
#### Bob, 30, Male
#### Claire, 27, Female
### Then, print the DataFrame.

In [8]:
import pandas as pd

# Create a dictionary 
data = {'Name': ['Alice', 'Bob', 'Claire'], 'Age': [25, 30, 27], 'Gender': ['Female', 'Male', 'Female']}

# Convert the dictionary into a pandas DataFrame
df = pd.DataFrame(data)

# Print the pandas DataFrame
print(df)


     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


### Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

**DataFrame:** In pandas, a DataFrame is a two-dimensional labeled data structure that consists of rows and columns, similar to a table or spreadsheet. It is one of the primary data structures used in pandas for data manipulation and analysis. Each column in a DataFrame can have a different data type, and it is capable of handling heterogeneous data.

**Example:**

In [9]:
import pandas as pd

# Create a dictionary 
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

# Create DataFrame
df = pd.DataFrame(data)

# Print the pandas DataFrame
print(df)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   40      Houston


**Series:** On the other hand, a Series is a one-dimensional labeled array capable of holding any data type (e.g., integers, strings, floats, etc.). It is essentially a single column of data with associated index labels. A DataFrame is composed of one or more Series, where each Series represents a column of the DataFrame.

**Example:**

In [12]:
import pandas as pd

# Create the Pandas Series
ages = pd.Series([25, 30, 35, 40], name='Age')

# Print the Panda Series
print(ages)

0    25
1    30
2    35
3    40
Name: Age, dtype: int64


### Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can you give an example of when you might use one of these functions?

There are numerous functions available in pandas for data manipulation in a DataFrame. Some common functions include:

1. **`head()` and `tail()`:** These functions allow you to view the first few rows (`head()`) or last few rows (`tail()`) of the DataFrame. They are useful for quickly inspecting the structure and contents of the DataFrame.

2. **`info()`:** This function provides a concise summary of the DataFrame, including the data types, number of non-null values, and memory usage. It is helpful for understanding the composition of the DataFrame.

3. **`describe()`:** This function generates descriptive statistics for numeric columns in the DataFrame, such as count, mean, standard deviation, minimum, maximum, and quartile values. It provides insights into the distribution and variability of the data.

4. **`shape`:** This attribute returns a tuple representing the dimensions of the DataFrame (number of rows, number of columns). It is useful for quickly determining the size of the DataFrame.

5. **`loc[]` and `iloc[]`:** These indexing methods allow you to access rows and columns of the DataFrame by label (`loc[]`) or integer position (`iloc[]`). They are useful for selecting specific subsets of data from the DataFrame.

6. **`drop()`:** This function allows you to remove rows or columns from the DataFrame based on labels or integer index. It is useful for cleaning or preprocessing the data by eliminating unwanted observations or features.

7. **`fillna()`:** This function allows you to fill missing (NaN) values in the DataFrame with specified values or methods, such as a constant value, mean, median, or interpolation. It is helpful for handling missing data before performing analysis or modeling.

8. **`groupby()`:** This function enables you to group data in the DataFrame based on one or more columns and perform aggregation operations (e.g., sum, mean, count) within each group. It is useful for generating summary statistics or exploring relationships between variables.

9. **`merge()` and `concat()`:** These functions allow you to combine multiple DataFrames either by merging on common columns (`merge()`) or concatenating along a specified axis (`concat()`). They are useful for integrating data from different sources or performing relational operations.

10. **`apply()`:** This function allows you to apply a custom function to each element, row, or column of the DataFrame. It is useful for performing element-wise transformations or calculations that are not provided by built-in pandas functions.

For example, you might use the `fillna()` function to replace missing values in a DataFrame with the mean value of each column:

In [13]:
import pandas as pd

# Create a DataFrame with missing values
data = {'A': [1, 2, None, 4],
        'B': [5, None, 7, 8]}
df = pd.DataFrame(data)

# Fill missing values with the mean of each column
df_filled = df.fillna(df.mean())

print(df_filled)

          A         B
0  1.000000  5.000000
1  2.000000  6.666667
2  2.333333  7.000000
3  4.000000  8.000000


In this example, missing values in columns 'A' and 'B' are filled with the mean values of each respective column. This ensures that the DataFrame is ready for further analysis or modeling without losing too much information due to missing data.

### Q6. Which of the following is mutable in nature Series, DataFrame, Panel?

In pandas, both Series and DataFrame are mutable objects, meaning their contents can be modified after creation. However, Panel objects are considered deprecated in recent versions of pandas and have been replaced by the more flexible and powerful MultiIndex DataFrame.

Therefore, the correct answer is:

- Series
- DataFrame

### Q7. Create a DataFrame using multiple Series. Explain with an example.

To create a DataFrame using multiple Series in pandas, you can pass a dictionary where the keys represent the column names and the values are the Series objects. Each Series will become a column in the resulting DataFrame. 

**Example:**

In [14]:
import pandas as pd

# Define multiple Series
s1 = pd.Series(['Alice', 'Bob', 'Claire'], name='Name')
s2 = pd.Series([25, 30, 27], name='Age')
s3 = pd.Series(['Female', 'Male', 'Female'], name='Gender')

# Create a DataFrame using the Series
df = pd.DataFrame({'Name': s1, 'Age': s2, 'Gender': s3})

# Print the DataFrame
print(df)


     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male
2  Claire   27  Female


**Note:** The index labels are automatically generated as integers starting from 0, as we did not specify any index labels explicitly.