# Table of contents
- Pandas Foundation

# Pandas Foundations

### Importing Pandas and Numpy

In [None]:
import pandas as pd
import numpy as np

### Structured data vs Unstructured data
Data that is stored in tables, such as CSV files, Excel spreadsheets, or database tables, is all structured. Unstructured data consists of free form text, images, sound, or video. If you find yourself dealing with structured data

## DataFrame

### Dataset Description

This dataset is composed of a curated collection of over 200 publicly available COVID-19 related datasets from sources like Johns Hopkins, the WHO, the World Bank, the New York Times, and many others. It includes data on a wide variety of potentially powerful statistics and indicators, like local and national infection rates, global social distancing policies, geospatial data on movement of people, and more.

The primary data structure in pandas is called a dataframe. Dataframes are the pandas equivalent of a Numpy 2D ndarray, with a few key differences:

- Axis values can have string labels, not just numeric ones.
- Dataframes can contain columns with multiple data types: including integer, float, and string.


![df_anatomy_static_resized.svg](attachment:df_anatomy_static_resized.svg)
<small>Image Credit:DataQuest</small>

### Creating DataFrame

In [None]:

# data={"day":[0,1,2,3,4,5,6],
#      "sleep":[9,7,6,5,8,7,9],
#      "work":[6,7,9,6,7,8,8]}

# df=pd.DataFrame(data)
# print(df)


### Reading Data into DataFrame

In [None]:
death=pd.read_csv('/kaggle/input/uncover/UNCOVER_v4/UNCOVER/USAFacts/confirmed-covid-19-deaths-in-us-by-state-and-county.csv')
death.head()

In [None]:
death.tail()

### DataFrame attributes

In [None]:
columns = death.columns
index = death.index
# data = death.to_numpy()
data = death.values

In [None]:
columns

In [None]:
index

In [None]:
data

**Type of each DataFrame component**

In [None]:
type(columns)

The `Index` objects are implemented using hash tables that allow for very fast selection and data alignment.

In [None]:
# columns.to_numpy()
columns.values

In [None]:
type(index)

In [None]:
# index.to_numpy()
index.values

A `RangeIndex` is a subclass of an Index that is analogous to Python's `range` object

In [None]:
type(data)

the `.values` DataFrame attribute returned a NumPy n-dimensional array, or `ndarray`

## Data Types
### Continuous data vs Categorical data, 
Continuous data is always numeric and represents some kind of measurements, such as height, wage, or salary. Continuous data can take on an infinite number of possibilities. Categorical data, on the other hand, represents discrete, finite amounts of values such as car color, type of poker hand, or brand of cereal.

The following describes common pandas data types:

- `float` – The NumPy float type, which supports missing values
- `int` – The NumPy integer type, which does not support missing values
- `Int64` – pandas nullable integer type
- `object` – The NumPy type for storing strings (and mixed types)
- `category` – pandas categorical type, which does support missing values
- `bool` – The NumPy Boolean type, which does not support missing values (None becomes False, np.nan becomes True)
- `boolean` – pandas nullable Boolean type
- `datetime64[ns]` – The NumPy date type, which does support missing values (NaT)

In [None]:
death.dtypes

The `.value_counts` method to return the counts of each data type

In [None]:
death.dtypes.value_counts()

In [None]:
death.info()

## Column Selection 

In [None]:
# death['state_name']
death.state_name

### .loc and .iloc
The usage of `.loc` specifies a selector for both rows and columns separated by a comma. The row selector is a slice with no start or end name (:) which means select all of the rows. The column selector will just pull out the column named state_name.

The `.iloc` index operation also specifies both row and column selectors. The row selector is the slice with no start or end index (:) that selects all of the rows. The column selector, 1, pulls off the second column:

In [None]:
death.loc[:,'state_fips']

In [None]:
death.iloc[:,3]

## Series
Series is an object which is similar to Python built-in list data structure but differs from it because it has associated label with each element or so-called index. This distinctive feature makes it look like associated array or dictionary.

**Attributes and methods of Series and DataFrames**

In [None]:
s_attr_methods = set(dir(pd.Series))
print(len(s_attr_methods))
df_attr_methods = set(dir(pd.DataFrame))
print(len(df_attr_methods))
print(len(s_attr_methods & df_attr_methods))

In [None]:
state_name = death['state_name']
deaths_s = death['deaths']
print(state_name.dtype)
print(deaths_s.dtype)

In [None]:
state_name.head()

In [None]:
state_name.sample(n=5,random_state=1100)

In [None]:
state_name.value_counts()

In [None]:
deaths_s.value_counts()

In [None]:
print(state_name.size)
print(state_name.shape)
print(len(state_name))
print(state_name.count())
print(state_name.unique())

**Basic summary statistics:** `.min`, `.max`, `.mean`, `.median`, and `.std`

In [None]:
print(deaths_s.min())
print(deaths_s.max())
print(deaths_s.mean())
print(deaths_s.median())
print(deaths_s.std())

**The `.describe` method**

In [None]:
deaths_s.describe()

**The `.quantile` method**

In [None]:
print(deaths_s.quantile(0.2))
print(deaths_s.quantile([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]))

**The `.isna`, `.fillna`, and `.dropna` methods**

In [None]:
death['lat'].isna()

In [None]:
lat_filled = death['lat'].fillna(0)
print(lat_filled.count())

In [None]:
lat_droped = death['lat'].dropna()
print(lat_droped.count())

In [None]:
death['lat'].hasnans

In [None]:
death['lat'].notna()

## Series operations
Series and DataFrames support many of the Python operators. Typically, a new Series or DataFrame is returned when using an operator.

**Arithmetic operators**

plus (`+`), minus (`-`), multiplication (`*`), division (`/`), exponentiation (`**`), floor division (`//`) and remainder (`%`)

In [None]:
deaths_s+6

In [None]:
deaths_s-10

In [None]:
deaths_s ** 2

**Comparison operators**

There exist six comparison operators, greater than (`>`), less than (`<`), greater than or equal to (`>=`), less than or equal to (`<=`), equal to (`==`), and not equal to (`!=`). Each comparison operator turns each value in the Series to `True` or `False` based on the outcome of the condition. The result is a Boolean array, which are useful for filtering.

In [None]:
deaths_s > 10

In [None]:
deaths_s != 0

All of the operators have method equivalents that produce the exact same result. For instance, in step 1, `deaths_s + 6` can be reproduced with the `.add` method.

Using the method rather than the operator can be useful when we chain methods together.



In [None]:
deaths_s.add(6)

Following is a table of operators and the corresponding methods:

|Operator group |	Operator |	Series method name|
|----------------|--------------|------------------|
|Arithmetic |`+,-,*,/,//,%,**`|`.add, .sub, .mul, .div, .floordiv, .mod, .pow`|
|Comparison | `<,>,<=,>=,==,!=` | `.lt, .gt, .le, .ge, .eq, .ne`|