# Python pandas

## Introduction to Pandas
**```pandas```** is an open source, BSD-licensed library providing **high-performance, easy-to-use data structures** and **data analysis tools** for the Python programming language.\
It is a Python library used for working with data sets and has functions for analyzing, cleaning, exploring, and manipulating data.\
The name of this library is derived from the phrase '**Panel Data**' which is a term used in Econometrics which means 'Datasets that include multiple observations over multiple periods of time'. With pandas, you can perform a wide range of data operations, including:

- Reading and writing data from various file formats like CSV, Excel and SQL databases.
- Cleaning and preparing data by handling missing values and filtering entries.
- Merging and joining multiple datasets seamlessly.
- Reshaping data through pivoting and stacking operations.
- Conducting statistical analysis and generating descriptive statistics.
- Visualizing data with integrated plotting capabilities.

Unlike ```NumPy``` or ```matplotlib```, ```pandas``` is NOT one of the essential components of the Python scientific computing tools.\
```pandas``` is built on top of ```NumPy``` and it steps on its computational abilities. In that sense, pandas is library that takes advantage of the NumPy array structure, which contributes to the fast execution of mathematical operations. Hence, pandas requires NumPy and is a library that is used largely used by the Data Science community because it makes the work much faster and easier.

### Numpy or pandas, who first?
From a *programming perspective*, ```NumPy``` comes first and then ```pandas```, but in the practise of a **Data Analyst** it's usually the other way around.\
This is because ```pandas``` focus more on your analytical task and less in the underlying mathematical computations.\
You'd normally use ```pandas``` to import the data into Python as it allows you to easily manipulate data until you've obtained a clean, well-organized dataset, that is ready for preprocessing.\
Once that is done, the dataset is ready for mathematical computations, which is when, using ```Numpy``` to preprocess data makes sense.

### Why ```pandas```?
- pandas makes the experience of doing Data Analysis in Python much **faster and easier** as\
  it offers a simple and intuitive way to **work with structured data**, especially using DataFrames.
- While NumPy is the library to opt for when dealing numerical calculations,\
  ```pandas``` is specifically designed to **help with datasets** containing multiple types of information.
- Revolves around two primary Data structures: **Series** (1D) and **DataFrame** (2D).
- Makes **data exploration** easy, so you can quickly understand patterns or spot issues.
- Built **on top of NumPy**, efficiently manages large datasets, offering tools for data cleaning, transformation, and analysis.
- Seamlessly **integrates with other Python libraries** like NumPy, Matplotlib, and scikit-learn.
- Provides methods like .dropna() and .fillna() to **handle missing values** seamlessly
- It **enhances analytical work** when you have to combine information across several datasets
- pandas has the ability to import data from and export data to an **extensive set of file formats**.
- **Preserves Data Consistency**: The general rule to maintaning consistent data is that you must have data values of a single type stored in your Series object or in each column of your DataFrame.

# pandas Series and DataFrames
Pandas Series and DataFrames are the fundamental data structures in the pandas library, crucial for data manipulation and analysis in Python.
### pandas Series
- Single-column data.
- A set of ebservations related to a single variable.
- Corresponds to 1D array structure from NumPy.
- Each element in a Series has a unique identifier called an *index*, which can be customized 

### pandas DataFrames
- Multi-column data.
- A collection of Series objects, which contains observations related to one or several variables.
- Hence, the information is organised into rows and columns.
- Corresponds to 2D structure from NumPy.
- Each column in a DataFrame is essentially a Pandas Series.
- You can access a single column of a DataFrame, and the result will be a Series object.

In essence, Series are the building blocks, and DataFrames provide a structured, tabular way to organize and work with multiple Series, enabling powerful data analysis capabilities. A certain tool that is applicable to a Series will most probably be relavant to use on DataFrame as well or vice versa. However, there can be exceptions, which we will discover as we practise and learn further.

## Installation, Updating and Importing
```pip install pandas```

To upgrade to the latest version of pandas:\
```pip install pandas --upgrade```

```pandas``` is usually imported under the pd alias

In [1]:
import pandas as pd

In [2]:
#checking pandas version
pd.__version__

'2.3.2'

# pandas Series
A Pandas Series is a one-dimensional labeled array in the Pandas library for Python. It is a fundamental data structure used for data analysis and manipulation, and can be thought of as a single column in a spreadsheet or a DataFrame.\
Key characteristics of a Pandas Series:
- **One-dimensional**: It represents a single sequence of data.
- **Labeled**: Each element in a Series has an associated label, known as its index. This index allows for efficient retrieval and alignment of data.
- **Heterogeneous data types**: A Series can hold data of various types, including integers, floats, strings, booleans, and even Python objects.
- **Immutable size**: Once created, the number of elements in a Series cannot be changed, although the values themselves can be modified.

## Creating a pandas Series

In [3]:
# We can create a 'Series' object from a list
products = ['A', 'B', 'C', 'D']
products

['A', 'B', 'C', 'D']

In [4]:
type(products)

list

In [5]:
new_products = pd.Series(products)

In [6]:
new_products

0    A
1    B
2    C
3    D
dtype: object

- We see that 4 letters from the 'products' list have been organized into a column.
- The set of values displayed to the left of the letters, those number represent the **index values** of the Series
- dtype being 'object' is the default datatype assigned to data which is not numeric

In [7]:
type(new_products)

pandas.core.series.Series

In [8]:
# numeric data
daily_expenses = pd.Series([40, 45, 50, 60, 35])
daily_expenses

0    40
1    45
2    50
3    60
4    35
dtype: int64

In [9]:
print(daily_expenses)

0    40
1    45
2    50
3    60
4    35
dtype: int64


In [10]:
# pandas Series object corresponds to the 1-D NumPy array structure
import numpy as np

In [11]:
arr1 = np.array([10, 20, 30, 40, 50])
arr1

array([10, 20, 30, 40, 50])

In [12]:
type(arr1)

numpy.ndarray

In [13]:
series1 = pd.Series(arr1)
series1

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [14]:
type(series1)

pandas.core.series.Series

#### Key Takeaways:
1. The ```pandas``` Series object is something like a powerful version os the Python list, or an enhanced version of the NumPy array.\
This doesn't mean that 'Series' should be the preferred choice between the three no matter what.\
Since there's always a trade-off in terms of:

> *what you want to obtain from you data* **vs**\
> *the speed and precision with which you can do that*

However, if in a given situation, you have opted for using a 'Series', this will entail working with a larger set of tools and capabilities that are pertinent to the ```pandas library``` only. These tools and capabilities are often related to the fact that the 'Series' object stores it values in a sequenced order, and has an *explicit index*.

2. Remember to always maintain ***data consistency***

## Working with Attributes in Python

In [15]:
series1

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [16]:
series1.dtype

dtype('int64')

In [17]:
series1.size

5

In [18]:
new_products

0    A
1    B
2    C
3    D
dtype: object

In [19]:
new_products.dtype

dtype('O')

In [20]:
new_products.size

4

#### Attributes of certain Class coming to play
We already know that *attributes* deliver information about a given object, but how do they do that?\
They do that by returning a seperate Python object. In fact, the benefit of this is that you can use the returned object in larger expressions about programming.\
For instance, if we ask for the *size* of any Series as an argument of the *type* function, we will recieve the output as 'int' saying that the value is an integer.

In [21]:
type(new_products.size)

int

Therefore, the number of elements stored in the variable can be referred to in further calculations by using the short expression.\
Another peculiarity, when using attributes, is that at a certain moment in time, they may contain no specific value. Here's an example

In [22]:
new_products.name

In [23]:
print(new_products.name)

None


Something to keep in mind:
> 'new_products' - the name we can use to refer to that object in our code.\
> Object name - the one you want to see whenever displaying its contents.

In other words, you can say that you are working the *new_products* Object in your program but the name you want to associate it with its data can be a different one.

In [24]:
new_products.name = "New Products"
new_products

0    A
1    B
2    C
3    D
Name: New Products, dtype: object

In [25]:
print(new_products)

0    A
1    B
2    C
3    D
Name: New Products, dtype: object


The attributes related to a certain Python object allow us to extract information about it.\
However, they are not meant to alter or modify its content in any way.

## Using an Index in pandas

In Pandas, the **index** is a crucial component of both ```Series``` and ```DataFrame``` objects, serving as a *set of labels* that identify each row (or column in the case of Series when transposing). It provides a mechanism for efficient data access, alignment, and identification.\
\
While explaining the use of an Index, we will work with a Series created from a *Dictionary* as opposed to a list or array.

In [26]:
prices_per_category = {'Product A': 22250, 'Product B': 16600, 'Product C': 15600}
prices_per_category

{'Product A': 22250, 'Product B': 16600, 'Product C': 15600}

In [27]:
type(prices_per_category)

dict

In [28]:
prices_per_category = pd.Series(prices_per_category)
prices_per_category

Product A    22250
Product B    16600
Product C    15600
dtype: int64

In [29]:
type(prices_per_category)

pandas.core.series.Series

We can see that the *keys* have been translated to *index values*. These indices act as access labels that can explicitly indicate the integer values we have on the right side.\
This is exactly what a Series index needs to do.\
It contains *Index Values* that point to the relevant *Data Points*, stored in a Series.

The Series *index* is seperate object.

In [30]:
prices_per_category.index

Index(['Product A', 'Product B', 'Product C'], dtype='object')

In [31]:
type(prices_per_category.index)

pandas.core.indexes.base.Index

Using indexes has many practical applications when working with pandas. Few crucial points to remember:
1. An *index* allows you to refer to a position within a sequence, or, in other words, a set of values in a sequenced order.
2. You will be able to quickly access the prices of the relevant categories through their respective *indices*
3. The *index* data stucture will often turn out to be a way to speed up your computations while working with large datasets.

## Position-based vs Label-based Indexing

A Series is a one-dimensional labeled array, and its index provides a way to identify and retrieve its values. The ```pandas Series' index``` represents a very solid data structure. Sometimes, they can **implicit** and sometimes, **explicit**. 

### Implicit Indexing (Position-based)

In [32]:
series_a = pd.Series([10, 20, 30, 40, 50])
series_a

0    10
1    20
2    30
3    40
4    50
dtype: int64

When the user doesn't specify index, pandas will immediately attach the default zero-based indexing to their object. This way, they can refer to the object's values via their *positions*. Hence, *implicit* index will always be the default numeric index.

In [33]:
series_a.index

RangeIndex(start=0, stop=5, step=1)

In [34]:
type(series_a.index)

pandas.core.indexes.range.RangeIndex

In [35]:
list(series_a.index)

[0, 1, 2, 3, 4]

In [36]:
series_a[2]

np.int64(30)

### Explicit Indexing (Label-based)
When a Series has a custom index, you can access elements using their corresponding labels. Labels are names that will logically correspond to the data values contained in the Series.

In [37]:
prices = pd.Series({'Product A': 22250, 'Product B': 16600, 'Product C': 12500})
prices

Product A    22250
Product B    16600
Product C    12500
dtype: int64

In this example, we have *explicitly* specified our index.

In [38]:
prices.index

Index(['Product A', 'Product B', 'Product C'], dtype='object')

In [39]:
type(prices.index)

pandas.core.indexes.base.Index

In [40]:
prices['Product C']

np.int64(12500)

## More on working with Indices in Python

In [41]:
seriesA = pd.Series([10, 20, 30, 40, 50])
prices = pd.Series({'Product A': 22500, 'Product B': 16600, 'Product C': 24600})

In [42]:
seriesA

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [43]:
seriesA[0]

np.int64(10)

In [44]:
prices

Product A    22500
Product B    16600
Product C    24600
dtype: int64

In [45]:
prices['Product B']

np.int64(16600)

In [46]:
prices[1]

  prices[1]


np.int64(16600)

In [47]:
seriesB = pd.Series([10, 20, 30, 40, 50], index = [1, 2, 3, 4, 5])
seriesB              # overrides implicit zero-based indexing

1    10
2    20
3    30
4    40
5    50
dtype: int64

In [48]:
seriesB[1]

np.int64(10)

In [49]:
seriesC = pd.Series([10, 20, 30, 40, 50], index = ['1', '2', '3', '4', '5'])
seriesC

1    10
2    20
3    30
4    40
5    50
dtype: int64

In [50]:
seriesC['1']

np.int64(10)

In [51]:
seriesC[0]

  seriesC[0]


np.int64(10)

## Using Method in Python

A Python **object** is associated with a certain collection of **attributes** and **methods**.\
```Attributes``` provide the Metadata and are *passive*\
```Methods``` provide functionalities and behaviour of the object and are *active*

### Methods vs Functions
When provided with some initial data, both tools can make specific operations with it and return an output. However,
- **Functions** - is an independent entity. It is not associated with an object by default.
- **Methods** - generally applied to an object of certain class. When called or invoked, a method can have access to object's data and can also manipulate the object's state.

Since we can't use a method unless there's an object to associate it with, different libraries contain  their own sets of methods. Thus, they can be applied to the type of objects associated with these libraries only.

In [52]:
start_date_deposits = pd.Series({
    '7/4/2014' : 2000,
    '1/2/2015' : 2000,
    '12/8/2012' : 1000,
    '2/20/2015' : 2000,
    '10/28/2013' : 2000,
    '4/19/2015' : 2000,
    '7/4/2016' : 2000,
    '4/24/2014' : 2000,
    '9/3/2015' : 4000,
    '7/25/2016' : 2000,
    '5/1/2014' : 2000,
    '3/29/2013' : 2000,
    '10/3/2014' : 2000,
    '9/18/2015' : 2000,
})

In [53]:
start_date_deposits

7/4/2014      2000
1/2/2015      2000
12/8/2012     1000
2/20/2015     2000
10/28/2013    2000
4/19/2015     2000
7/4/2016      2000
4/24/2014     2000
9/3/2015      4000
7/25/2016     2000
5/1/2014      2000
3/29/2013     2000
10/3/2014     2000
9/18/2015     2000
dtype: int64

Mathematical Methods

In [54]:
start_date_deposits.sum()

np.int64(29000)

In [55]:
start_date_deposits.min()

np.int64(1000)

In [56]:
start_date_deposits.max()

np.int64(4000)

In [57]:
start_date_deposits.idxmax()    # index of max value

'9/3/2015'

In [58]:
start_date_deposits.idxmin()    # index of min value

'12/8/2012'

Non-mathematical Methods

In [59]:
start_date_deposits.head()    # first 5 rows

7/4/2014      2000
1/2/2015      2000
12/8/2012     1000
2/20/2015     2000
10/28/2013    2000
dtype: int64

In [60]:
start_date_deposits.tail()     # last 5 rows

7/25/2016    2000
5/1/2014     2000
3/29/2013    2000
10/3/2014    2000
9/18/2015    2000
dtype: int64

## Parameters vs Arguments

One of the best features of ```methods``` is that we can also modify their performance. Technically, we achieve that by knowing the ```parameters``` associated with a given method and then supplying the relevant ```arguments``` upon execution.\
\
For eg. the ```head()``` methods, by default, displays the first 5 rows of the Series. 

In [61]:
start_date_deposits.head()

7/4/2014      2000
1/2/2015      2000
12/8/2012     1000
2/20/2015     2000
10/28/2013    2000
dtype: int64

However, it also provides us with the option to choose the number of rows to be displayed. So, if we add the number 3 or 10 within in the parenthesis, we will obtain the first 3 or 10 values resp. only.

In [62]:
start_date_deposits.head(3)

7/4/2014     2000
1/2/2015     2000
12/8/2012    1000
dtype: int64

In programming terms, the option **to set** the number of rows to display is called a **```parameter```** of the ```head()``` method. Our choice of that number in a given situation, is referred to as an **```argument```**. Hence, the **parameter** allows us to modify the way in which the method will operate.\
\
Pandas methods have ```parameters``` you can supply with ```arguments``` to modify the performance of the given method. Furthermore, in case you are working with methods that have multiple parameters, it is good practise to refer to the parameters by using their names explicitly and in the right order (*for eg.* n=10).

In [63]:
start_date_deposits.head(n=10)

7/4/2014      2000
1/2/2015      2000
12/8/2012     1000
2/20/2015     2000
10/28/2013    2000
4/19/2015     2000
7/4/2016      2000
4/24/2014     2000
9/3/2015      4000
7/25/2016     2000
dtype: int64

So, what is the right order? and...\
**What, and how many parameters are associated with a certain pandas method?**

## Dive Deeper and learn Better
This and much more information of the library has been extensively covered in [**the pandas Documentation**](https://pandas.pydata.org/docs/). *(dicussed more in detail in the end)*.\
You can visit the linked site and read the documentation for deep understanding.\
But in case you want to do a specific search, say you want to learn in detail only about ```pd.Series()``` method, then you can directly type it in your search engine search bar and press search. The first link to appear will be of the offial pandas documentation site, and it will take you directly to the specific method you searched for. \
Also, you another quicker way of diving deeper into the details of a method, right here in the Jupyter Notebook, can be done with the ```shift + tab``` method after typing a method.

# pandas DataFrames
A Pandas DataFrame is a two-dimensional, mutable, tabular data structure with labeled axes (rows and columns). It is a core component of the Pandas library in Python, designed for efficient data manipulation and analysis, and is widely used in data science, machine learning, and other data-intensive fields.\
Key characteristics of a Pandas DataFrame:
- **Tabular Structure**: Data is organized in rows and columns, similar to a spreadsheet or a SQL table.
- **Labeled Axes**: Both rows and columns have labels (an index for rows and column names for columns), allowing for easy access and manipulation of data by name.
- **Heterogeneous Data Types**: Columns can hold different data types (e.g., integers, floats, strings, booleans, dates), while all elements within a single column typically share the same data type.
- **Mutable**: DataFrames can be modified after creation, allowing for adding, deleting, or updating columns and rows.

## How can DataFrames improve your Analysis?
Rather than seeing DataFrames as just 2D arrays, you can think of a DataFrame as a collection of multiple **observations**(rows) for the given **variables**(columns). Thus, you will be able to obtain information contained in single datapoint by referring to the relevant observation of a certain variable.\
\
In technical comparison, you can use the **row index** of a ```Series``` as a *Single point of Reference* to obtain a certain Data value.\
In case of DataFrame, we need to use *Two point of Reference*, the **row index** and the **column index**. Using just one point of reference in this case will either result in giving the entire row or the entire column as output.

## Series and DataFrames as a Programming Objects
**```Series```** can be seen as a **powerful version of the Python List**.\
However, it also includes some Python **Dictionary features** as Series' *indexing* relates to the *keys* of a dictionary, thus allowing us to extract the desired parts of the given dataset more quickly and efficiently.\
\
Taking that into account, the Python **```DataFrame```** object is simply an **enhanced Python Dictionary**. Much like creating a ```Series``` from a Python dictionary, we can construct a ```DataFrame```. The difference is that we don't need to associate a single value to the dictionary key. We can provide a whole object that contains the values of an entire column to the dictionary keys.\
Thus, DataFrames inherit the characteristics of the Dictionary class.

## Creating DataFrames from Scratch
In a professional environment, you will be typically required ti wirk with an already existing **dataset**. After you import it, you'll be able to clean and preprocess its data and take your analysis from there.\
\
However, you must know how to create a **DataFrame** from scratch. This is a great exercise to better internalise its structure and functionalities.\
DataFrames can be created in 6 different ways:

Please keep in mind about adopting the following this procedure throughout the upcoming examples.\
We'll store the initial structure in an *object* called **```data```** and then pass it as an *argument* of the ```pd.DataFrame()``` method right after that.\
In the end, we'll use the well-known ```df``` convention to form and display our final DataFrame object 

### #1: DataFrame from dictionary of lists:

In [64]:
data = {'ProductName':['Product A', 'Product B', 'Product C'], 'ProductPrice':[22250, 16600, 12500]}
df = pd.DataFrame(data)
df

Unnamed: 0,ProductName,ProductPrice
0,Product A,22250
1,Product B,16600
2,Product C,12500


- As you can see that since we didn't specify any *indexing*, there was **implicit default integer index**.
- Exactly like in a *Series*:
  - the ```DataFrame index``` has been located on the **left**,
  - then we have the ```column labels``` on the **top** and then
  - the ```data values``` are located in **middle** at the intersections.
- For eg.:
  - the *ProductName* of *observation 0* is ```Product A```
  - the *ProductPrice* of *observation 2* is ```12500```
- You can also see how **Data Consistency** is preserved in this table, the first column contains only Strings, the second one only Integers.

In [65]:
data = {1: ['Product A', 'Product B', 'Product C'], 2: [22250, 16600, 12500]}
df = pd.DataFrame(data)
df

Unnamed: 0,1,2
0,Product A,22250
1,Product B,16600
2,Product C,12500


- While creating a ```DataFrame``` from scratch, we must always be careful with the number of records and dimensions we provide.

In [66]:
# data = {'ProductName':['Product A', 'Product B', 'Product C', 'Product D'], 'ProductPrice':[22250, 16600, 12500]}
# df = pd.DataFrame(data)
# df

# will lead to ValueError since the length of the two arrays is different.

### #2: DataFrame from dictionary of lists + specify an index:

In [67]:
# making use of the 'index' parameter in pd.DataFrame method
data = {'ProductName': ['Prod A', 'Prod B', 'Prod C'], 'ProductPrice':[22250, 16600, 12500]}
df = pd.DataFrame(data, index = ['A', 'B', 'C'])
df

Unnamed: 0,ProductName,ProductPrice
A,Prod A,22250
B,Prod B,16600
C,Prod C,12500


In [68]:
# more professional way
data = {'ProductName': ['Prod A', 'Prod B', 'Prod C'], 'ProductPrice':[22250, 16600, 12500]}
product_Ids = ['A', 'B', 'C']
df = pd.DataFrame(data, index = product_Ids)
df

Unnamed: 0,ProductName,ProductPrice
A,Prod A,22250
B,Prod B,16600
C,Prod C,12500


### #3: DataFrame from a list of dictionaries:
In terms of code, this is lengthy. But this practise involves *indentation* of dictionaries on below the other to improve code readability. Any kind of inconsistency created in the DataFrame will be easily identifiable than in other techniques.\
In this case, each dictionary represents one row or a **record** from the ```DataFrame```.\
the ```keys``` act as *column names* 

In [69]:
data = [
    {'ProductName': 'Prod A', 'ProductPrice': 22250},
    {'ProductName': 'Prod B', 'ProductPrice': 16600},
    {'ProductName': 'Prod C', 'ProductPrice': 12500},
]
df = pd.DataFrame(data)
df

Unnamed: 0,ProductName,ProductPrice
0,Prod A,22250
1,Prod B,16600
2,Prod C,12500


In [70]:
data = [
    {'ProductName': 'Prod A', 'ProductPrice': 22250},
    {'ProductName': 'Prod B', 'ProductPrice': 16600},
    {'ProductName': 'Prod C', 'ProductPrice': [12500, 15300]},
]
df = pd.DataFrame(data)
df

Unnamed: 0,ProductName,ProductPrice
0,Prod A,22250
1,Prod B,16600
2,Prod C,"[12500, 15300]"


In [71]:
data = [
    {'ProductName': 'Prod A', 'ProductPrice': 22250},
    {'ProductName': 'Prod B', 'ProductPrice': 16600},
    {'ProductName': 'Prod C', 'ProductPrice': 12500},
    {'ProductName': 'Prod D'}
]
df = pd.DataFrame(data)
df

Unnamed: 0,ProductName,ProductPrice
0,Prod A,22250.0
1,Prod B,16600.0
2,Prod C,12500.0
3,Prod D,


Here, we can clearly see how not putting any value at *ProductPrice* for the *4th observation* leaded to putting a ```NaN```.

### #4: DataFrame from a dictionary of pandas Series:

In [72]:
ser_products = pd.Series(['Prod A', 'Prod B', 'Prod C'])
ser_prices = pd.Series([22250, 16600, 12500])

In [73]:
data = {'ProductName': ser_products, 'ProductPrice': ser_prices}
df = pd.DataFrame(data)
df

Unnamed: 0,ProductName,ProductPrice
0,Prod A,22250
1,Prod B,16600
2,Prod C,12500


In [74]:
ser_products = pd.Series(['Prod A', 'Prod B', 'Prod C'], index = ['A', 'B', 'C'])
ser_prices = pd.Series([22250, 16600, 12500], index = ['A', 'B', 'C'])

data = {'ProductName': ser_products, 'ProductPrice': ser_prices}
df = pd.DataFrame(data)
df

Unnamed: 0,ProductName,ProductPrice
A,Prod A,22250
B,Prod B,16600
C,Prod C,12500


In [75]:
# change in index order in Series
ser_products = pd.Series(['Prod A', 'Prod B', 'Prod C'], index = ['A', 'B', 'C'])
ser_prices = pd.Series([22250, 16600, 12500], index = ['C', 'B', 'A'])

data = {'ProductName': ser_products, 'ProductPrice': ser_prices}
df = pd.DataFrame(data)
df

Unnamed: 0,ProductName,ProductPrice
A,Prod A,12500
B,Prod B,16600
C,Prod C,22250


### #5: DataFrame from a list of lists:

In [76]:
data = [['Prod A', 22250], ['Prod B', 16600], ['Prod C', 12500]]
df = pd.DataFrame(data)
df

Unnamed: 0,0,1
0,Prod A,22250
1,Prod B,16600
2,Prod C,12500


In [77]:
data = [['Prod A', 22250], ['Prod B', 16600], ['Prod C', 12500, 5000]]
df = pd.DataFrame(data)
df

Unnamed: 0,0,1,2
0,Prod A,22250,
1,Prod B,16600,
2,Prod C,12500,5000.0


In [78]:
data = [['Prod A', 22250], ['Prod B', 16600], ['Prod C', 12500]]
df = pd.DataFrame(data)
df

Unnamed: 0,0,1
0,Prod A,22250
1,Prod B,16600
2,Prod C,12500


In [79]:
df.columns = ['ProductName', 'ProductPrice']
df

Unnamed: 0,ProductName,ProductPrice
0,Prod A,22250
1,Prod B,16600
2,Prod C,12500


In [80]:
df.index = ['A', 'B', 'C']
df

Unnamed: 0,ProductName,ProductPrice
A,Prod A,22250
B,Prod B,16600
C,Prod C,12500


### #6: Construct DataFrame in a Professional Way:
While constructing a DataFrame, you need to provide info about:
- its data
- its columns
- its index

In [81]:
df = pd.DataFrame(data = [['Prod A', 22250], ['Prod B', 16600], ['Prod C', 12500]],
                 columns = ['ProductName', 'ProductPrice'],
                 index = ['A', 'B', 'C'])
df

Unnamed: 0,ProductName,ProductPrice
A,Prod A,22250
B,Prod B,16600
C,Prod C,12500


In [82]:
df.shape

(3, 2)

In [83]:
df.size

6

Just like saw earlier in the *pandas Series* section, you can dive deep into understanding concepts about *DataFrames* as well in [**the Pandas Documentation**](https://pandas.pydata.org/docs/).

## The pandas Documentation

The **```pandas Documentation```**, often referred to as ```The Docs```, refers to the official and comprehensive **collection of resources** that explain how to use, understand, and contribute to the Pandas library in Python. It serves as a **primary reference** for anyone working with Pandas, from beginners to experienced developers and is **constantly upgraded**.\
\
The documentation is organized into key sections:
- **Getting Started Guides**: Introductions to the main concepts of Pandas, often including tutorials for new users.
- **User Guide**: In-depth explanations of core concepts, background information, and detailed discussions of various functionalities with examples. This section covers topics like data structures (DataFrame, Series), working with missing data, data manipulation, time series, and more.
- **API Reference**: A comprehensive guide to the Pandas Application Programming Interface (API), detailing all public objects, functions, methods, and their parameters. This section is valuable for understanding the exact usage and behavior of specific Pandas components.
- **Developer Guide**: Provides information for those interested in contributing to the Pandas library or its documentation. It includes guidelines for coding standards, testing, and documenting code.

The Pandas documentation is typically found on the official [**Pandas website**](https://pandas.pydata.org/docs/) and is regularly updated with new versions of the library. It is a **crucial resource** for effective and efficient use of Pandas in data analysis and manipulation tasks as it can be a great point of reference throughout your work as a programmer.