## Pandas

Pandas is an open-source Python library providing high-performance data manipulation and analysis tool using its powerful data structures. The name pandas is derived from the word Panel Data – an Econometrics from Multidimensional data.

Prior to pandas, Python was majorly used for data munging and preparation. It had very little contribution towards data analysis. Pandas solved this problem. Using pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, prepare, manipulate, model, and analyze.

Python with pandas is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, etc.

Key Features of pandas:
- Fast and efficient DataFrame object with default and customized indexing.
- Tools for loading data into in-memory data objects from different file formats.
- Data alignment and integrated handling of missing data.
- Reshaping and pivoting of date sets.
- Label-based slicing, indexing and subsetting of large data sets.
- Columns from a data structure can be deleted or inserted.
- Group by data for aggregation and transformations.
- High performance merging and joining of data.
- Time Series functionality.

Pandas deals with the following three data structures −
- Series
- DataFrame
- Panel

These data structures are built on top of Numpy array, which means they are fast.

In [None]:
import numpy as np
import pandas as pd

## Series

Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index.

To create a Series object, we can use `pandas.Series` constructor and pass a list (e.g., NumPy array) of values.

`pandas.Series(<data>)`: One-dimensional ndarray with axis labels (indices).

grades = pd.Series(np.random.randint(50, 101, size=100))
grades

In [None]:
catalog = pd.read_csv("catalog.csv")

In [None]:
type(grades)

In [None]:
grades.shape

The `index` property can be used to get the indices of a Series. If indices are not manually assigned or changed, by default they will be values starting from zero similar to NumPy array indices.

`Series.index`: The index (axis labels) of the Series.

In [None]:
grades.index

To get the values associated with an index, `loc` property with an index can be used. For instance, `loc[2]` returns row associated with index 2. On the other hand, `iloc` returns rows based on their location. For example, `iloc[2]` returns the third row in the dataset, regardless of its index.

`Series.loc`: Access a group of rows and columns by label(s) or a boolean array.

`Series.iloc`: Purely integer-location based indexing for selection by position.

In [None]:
grades.loc[2]

In [None]:
grades.iloc[2]

Direct use of an index is similar to `loc`.

In [None]:
grades[2]

To get number of elements in a Series, `len` function can be used.

In [None]:
len(grades)

To directly delete an element in the Series, `del` keyword can be used.

In [None]:
del grades[2]

In [None]:
len(grades)

In [None]:
try:
    grades.loc[2]
except:
    print("element does not exist")

In [None]:
grades.iloc[2]

In [None]:
try:
    del grades[2]
except:
    print("element does not exist")

`Series.drop(`: Return Series with specified index labels removed. This leaves the original Series unchanged. If `inplace=True` is passed, do operation inplace and return `None`.

In [None]:
grades.drop([1,3])

Regular slicing operations work with Series.

In [None]:
grades.iloc[:5]

In [None]:
grades.iloc[2:6]

In [None]:
grades.iloc[-5:]

`Series.head(<n=5>)`: Return the first n rows.

In [None]:
grades.head()

`Series.tail(<n=5>)`: Return the last n rows.

In [None]:
grades.tail(10)

`Series.describe`: Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

In [None]:
grades.describe()

`Series.values`: Return Series as ndarray or ndarray-like depending on the dtype.

In [None]:
grades.values

Regular statistical methods are available for Series with numerical values.

In [None]:
grades.sum()

In [None]:
grades.mean()

In [None]:
grades.std()

In [None]:
grades.var()

In [None]:
grades.min()

In [None]:
grades.max()

In [None]:
grades.cumsum()

In [None]:
grades+5

In [None]:
catalog

Unnamed: 0,Product ID,Category,Price,Inventory,Weight,Condition
0,142019521928,Office Supplies,42.31,9,26.4,Used
1,135617259927,Furniture,29.28,9,13.4,New
2,786588848050,Furniture,98.50,6,43.7,Refurbished
3,957702802601,Electrical,14.68,26,17.1,Refurbished
4,716662964693,Furniture,27.17,0,24.2,Refurbished
5,868419050073,Electrical,45.08,41,29.9,New
6,91879907254,Industrial,22.24,3,48.9,New
7,147369664112,Office Supplies,35.12,3,7.7,New
8,861726542104,Office Supplies,41.37,16,36.7,New
9,236598380667,Electrical,84.64,24,27.8,Refurbished


In [None]:
catalog.info

<bound method DataFrame.info of         Product ID         Category  Price  Inventory  Weight    Condition
0     142019521928  Office Supplies  42.31          9    26.4         Used
1     135617259927        Furniture  29.28          9    13.4          New
2     786588848050        Furniture  98.50          6    43.7  Refurbished
3     957702802601       Electrical  14.68         26    17.1  Refurbished
4     716662964693        Furniture  27.17          0    24.2  Refurbished
5     868419050073       Electrical  45.08         41    29.9          New
6      91879907254       Industrial  22.24          3    48.9          New
7     147369664112  Office Supplies  35.12          3     7.7          New
8     861726542104  Office Supplies  41.37         16    36.7          New
9     236598380667       Electrical  84.64         24    27.8  Refurbished
10    652692835638       Electrical  15.32         16    17.6  Refurbished
11    878159088666       Electrical  52.33         19    48.6       

In [None]:
catalog.shape

(10000, 6)

In [None]:
catalog.describe(include='all')

Unnamed: 0,Product ID,Category,Price,Inventory,Weight,Condition
count,10000.0,10000,10000.0,10000.0,10000.0,10000
unique,,5,,,,3
top,,Electrical,,,,Refurbished
freq,,2055,,,,3341
mean,505851200000.0,,54.591341,24.5151,25.05608,
std,287359800000.0,,25.949133,14.475853,14.528007,
min,7068757.0,,10.0,0.0,0.1,
25%,258777500000.0,,32.395,12.0,12.4,
50%,507090600000.0,,54.34,24.0,24.9,
75%,753889600000.0,,77.0525,37.0,37.9,


In [None]:
len(catalog)

10000

In [None]:
catalog.loc[1]

Product ID    135617259927
Category         Furniture
Price                29.28
Inventory                9
Weight                13.4
Condition              New
Name: 1, dtype: object

In [None]:
catalog.iloc[1]

Product ID    135617259927
Category         Furniture
Price                29.28
Inventory                9
Weight                13.4
Condition              New
Name: 1, dtype: object

In [None]:
catalog.head(10)

Unnamed: 0,Product ID,Category,Price,Inventory,Weight,Condition
0,142019521928,Office Supplies,42.31,9,26.4,Used
1,135617259927,Furniture,29.28,9,13.4,New
2,786588848050,Furniture,98.5,6,43.7,Refurbished
3,957702802601,Electrical,14.68,26,17.1,Refurbished
4,716662964693,Furniture,27.17,0,24.2,Refurbished
5,868419050073,Electrical,45.08,41,29.9,New
6,91879907254,Industrial,22.24,3,48.9,New
7,147369664112,Office Supplies,35.12,3,7.7,New
8,861726542104,Office Supplies,41.37,16,36.7,New
9,236598380667,Electrical,84.64,24,27.8,Refurbished


In [None]:
catalog.tail(10)

Unnamed: 0,Product ID,Category,Price,Inventory,Weight,Condition
9990,323692635146,Lighting,62.21,38,10.0,Used
9991,156636116293,Industrial,28.08,5,21.8,New
9992,401297038633,Industrial,52.7,16,37.5,Used
9993,845444023134,Lighting,43.41,29,4.2,Used
9994,926912112440,Industrial,80.5,22,1.4,Refurbished
9995,887953572393,Electrical,75.43,0,14.4,New
9996,585077919932,Lighting,72.28,21,6.6,New
9997,192754250040,Industrial,75.69,49,11.2,New
9998,395994805746,Lighting,47.81,39,28.0,Refurbished
9999,710891685647,Industrial,79.47,9,3.3,New


In [None]:
catalog["Price"].max()

99.99

In [None]:
catalog.columns

In [None]:
catalog["Product ID"]

In [None]:
type(catalog["Product ID"])

In [None]:
catalog[["Product ID", "Category"]]

In [None]:
catalog['Category'].unique()

In [None]:
catalog['Category'].nunique()

In [None]:
del catalog["Weight"]

In [None]:
a = catalog.drop([1, 2])
a

In [None]:
a.iloc[1]

In [None]:
a.iloc[2]

In [None]:
catalog

In [None]:
b = a.drop("Category", axis=1)
b

In [None]:
catalog

In [None]:
b.head()

In [None]:
catalog['Price'].mean()

In [None]:
catalog['Price'].max()

In [None]:
catalog['Price'].min()

## Aggregation

In [None]:
catalog["Price"].sum()

In [None]:
agg = catalog.groupby('Category')["Inventory"].sum()
agg

In [None]:
agg.to_csv("agg.csv")

In [None]:
catalog.groupby(["Category", "Condition"])["Inventory"].mean()

In [None]:
catalog.groupby("Condition")["Price"].max()

##Slicing

In [None]:
catalog

In [None]:
catalog["Condition"] == "New"

In [None]:
catalog[catalog["Condition"] == "New"]

In [None]:
catalog[(catalog["Inventory"] > 0) & (catalog["Condition"] == "New")]

In [None]:
catalog[(catalog["Inventory"] > 0) | (catalog["Condition"] == "New")].describe()

In [None]:
catalog["Price"].plot(kind="hist", bins=1000)