In [1]:
import pandas as pd

data = [4, 8, 15, 16, 23, 42]
series = pd.Series(data)

print(series)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


In [2]:
data = [4, 8, 15, 16, 23, 42, 10, 20, 13, 27]
series = pd.Series(data)

print(series)

0     4
1     8
2    15
3    16
4    23
5    42
6    10
7    20
8    13
9    27
dtype: int64


In [4]:
data = {"Name" : ['Alice','Bob','claire'], "Age" : ['25','30','27'], "Gender" : ['Female','Male','Female']}
dataframe = pd.DataFrame(data)
print(dataframe)        

     Name Age  Gender
0   Alice  25  Female
1     Bob  30    Male
2  claire  27  Female


In Pandas, a DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. A DataFrame can be thought of as a dictionary of Series objects, where each column represents a Series. It is a powerful and flexible data structure that can handle different types of data and perform various data manipulations.

On the other hand, a pandas.Series is a one-dimensional labeled array that can hold data of any type, including integer, float, string, and others. It is similar to a column in a spreadsheet or a column in a database table.

Here is an example that demonstrates the difference between a DataFrame and a Series:

In [5]:
import pandas as pd

# Creating a DataFrame
data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
        'age': [25, 30, 35, 40],
        'city': ['New York', 'Paris', 'London', 'San Francisco']}
df = pd.DataFrame(data)

# Creating a Series
ages = pd.Series([25, 30, 35, 40], name='age')

# Accessing a column from DataFrame
print(df['age'])

# Accessing a value from Series
print(ages[1])


0    25
1    30
2    35
3    40
Name: age, dtype: int64
30


Pandas provides many functions to manipulate data in a DataFrame. Some common functions include:

head() and tail() - These functions allow you to preview the first or last few rows of your DataFrame, respectively.
describe() - This function generates descriptive statistics of your DataFrame, including count, mean, standard deviation, minimum, and maximum values.
groupby() - This function groups data by one or more columns and allows you to perform calculations on each group.
dropna() - This function drops any rows with missing values from your DataFrame.
sort_values() - This function sorts your DataFrame by one or more columns.
apply() - This function applies a function to each element or row of your DataFrame.
merge() - This function combines two DataFrames based on one or more common columns.

In [10]:
import pandas as pd
import seaborn as sns
df1 = sns.load_dataset('iris')
df2 = sns.load_dataset('titanic')
print(df1.head())  # Returns first 5 rows of the DataFrame
print(df1.tail(10))  # Returns last 10 rows of the DataFrame
print(df1.describe())
grouped_df = df1.groupby(['sepal_length'])
print(grouped_df['sepal_length'].mean())  # Calculates the mean price for each category
clean_df = df1.dropna()
sorted_df = df1.sort_values(['sepal_length', 'petal_length'], ascending=[True, False])

   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa
     sepal_length  sepal_width  petal_length  petal_width    species
140           6.7          3.1           5.6          2.4  virginica
141           6.9          3.1           5.1          2.3  virginica
142           5.8          2.7           5.1          1.9  virginica
143           6.8          3.2           5.9          2.3  virginica
144           6.7          3.3           5.7          2.5  virginica
145           6.7          3.0           5.2          2.3  virginica
146           6.3          2.5           5.0          1.9  virginica
147           6.5          3.0           5.2          2.0  virgi

Both Series and DataFrame are mutable in nature. This means that you can modify their values, add new elements, delete elements, or change their properties. On the other hand, Panel is not mutable, and it is recommended to use multi-indexing with DataFrames instead of using a Panel.

Note that while Series and DataFrames are mutable, it is generally recommended to avoid modifying them in place as this can cause unexpected behavior and potentially introduce bugs in your code. Instead, it is better to create a new copy of the modified data structure.

In [14]:
import pandas as pd

# Creating a series of fruits and their corresponding prices
fruits = pd.Series(["Apple", "Banana", "Cherry", "Durian"])
prices = pd.Series([1.00, 0.50, 2.50, 8.00])

# Creating a DataFrame by combining the two Series
fruit_prices_df = pd.DataFrame({"Fruit": fruits, "Price": prices})

# Printing the resulting DataFrame
print(fruit_prices_df)


    Fruit  Price
0   Apple    1.0
1  Banana    0.5
2  Cherry    2.5
3  Durian    8.0
