Pandas link: https://pandas.pydata.org/pandas-docs/stable/

https://www.tutorialspoint.com/python_pandas/python_pandas_introduction.htm

![Logo](imagenes/logo.png)
# Pandas

## Intro a Pandas

Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures. The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data.

In 2008, developer Wes McKinney started developing pandas when in need of high performance, flexible tool for analysis of data.

Prior to Pandas, Python was majorly used for data munging and preparation. It had very little contribution towards data analysis. Pandas solved this problem. Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, prepare, manipulate, model, and analyze.

Python with Pandas is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, etc.

## Key Features of Pandas
- Fast and efficient DataFrame object with default and customized indexing.
- Tools for loading data into in-memory data objects from different file formats.
- Data alignment and integrated handling of missing data.
- Reshaping and pivoting of date sets.
- Label-based slicing, indexing and subsetting of large data sets.
- Columns from a data structure can be deleted or inserted.
- Group by data for aggregation and transformations.
- High performance merging and joining of data.
- Time Series functionality.

<!-- Funciones principales: https://pandas.pydata.org/pandas-docs/stable/reference/general_functions.html -->

## General functions
## Data manipulations
| Function    | Description    |
| ----------- | ----------- |
| melt(frame[, id_vars, value_vars, var_name, ...])      | Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.      |
| pivot(data[, index, columns, values])      | Return reshaped DataFrame organized by given index / column values.      |
| pivot_table(data[, values, index, columns, ...])      | Create a spreadsheet-style pivot table as a DataFrame.      |
| crosstab(index, columns[, values, rownames, ...])      | Compute a simple cross tabulation of two (or more) factors.      |
| cut(x, bins[, right, labels, retbins, ...])      | Bin values into discrete intervals.      |
| qcut(x, q[, labels, retbins, precision, ...])      | Quantile-based discretization function.      |
| merge(left, right[, how, on, left_on, ...])      | Merge DataFrame or named Series objects with a database-style join.      |
| merge_ordered(left, right[, on, left_on, ...])      | Perform merge with optional filling/interpolation.      |
| merge_asof(left, right[, on, left_on, ...])      | Perform an asof merge.      |
| concat(objs[, axis, join, ignore_index, ...])      | Concatenate pandas objects along a particular axis with optional set logic along the other axes.      |
| get_dummies(data[, prefix, prefix_sep, ...])      | Convert categorical variable into dummy/indicator variables.      |
| factorize(values[, sort, na_sentinel, size_hint])      | Encode the object as an enumerated type or categorical variable.      |
| unique(values)      | Hash table-based unique.      |
| wide_to_long(df, stubnames, i, j[, sep, suffix])      | Wide panel to long format.      |



## Top-level missing data
| Function    | Description    |
| ----------- | ----------- |
| isna(obj)      | Detect missing values for an array-like object.      |
| isnull(obj)      | Detect missing values for an array-like object.      |
| notna(obj)      | Detect non-missing values for an array-like object.      |
| notnull(obj)      | Detect non-missing values for an array-like object.      |


## Top-level conversions
| Function    | Description    |
| ----------- | ----------- |
| to_numeric(arg[, errors, downcast])      | Convert argument to a numeric type.      |



## Top-level dealing with datetimelike
| Function    | Description    |
| ----------- | ----------- |
| to_datetime(arg[, errors, dayfirst, ...])      | Convert argument to datetime.      |
| to_timedelta(arg[, unit, errors])      | Convert argument to timedelta.      |
| date_range([start, end, periods, freq, tz, ...])      | Return a fixed frequency DatetimeIndex.      |
| bdate_range([start, end, periods, freq, tz, ...])      | Return a fixed frequency DatetimeIndex, with business day as the default frequency.      |
| period_range([start, end, periods, freq, name])      | Return a fixed frequency PeriodIndex.      |
| timedelta_range([start, end, periods, freq, ...])      | Return a fixed frequency TimedeltaIndex, with day as the default frequency.      |
| infer_freq(index[, warn])      | Infer the most likely frequency given the input index.      |


## Top-level dealing with intervals
| Function    | Description    |
| ----------- | ----------- |
| interval_range([start, end, periods, freq, ...])      | Return a fixed frequency IntervalIndex.      |


## Top-level evaluation
| Function    | Description    |
| ----------- | ----------- |
| eval(expr[, parser, engine, truediv, ...])      | Evaluate a Python expression as a string using various backends.      |

## Hashing
| Function    | Description    |
| ----------- | ----------- |
| util.hash_array(vals[, encoding, hash_key, ...])      | Given a 1d array, return an array of deterministic integers.      |
| util.hash_pandas_object(obj[, index, ...])     | Return a data hash of the Index/Series/DataFrame.     |

## Testing
| Function    | Description    |
| ----------- | ----------- |
| test([extra_args])     |       |

# Pandas instalacion

In [None]:
# Como instalar Pandas?
!pip install pandas
!pip install plotly

!pip install --upgrade nbformat

# Demo

In [1]:
## This tutorial will view data, manipulate data, data cleaning, create graphics from data
## Video https://www.youtube.com/watch?v=F-gDgQ6kuuk

import pandas as pd
import plotly.express as px

# Openning data
df_gold_prices = pd.read_csv('data/monthly_csv.csv')

# Viewing data
# Tail shows the last rows
print(df_gold_prices.tail(20))

# Selecting a column in a worksheet
dates = df_gold_prices['Date']
prices = df_gold_prices['Price']

# Simple operations
df_gold_prices['buy_price'] = prices * .9

print('Max Gold price')
print(df_gold_prices['Price'].max())

# Data cleaning
df_gold_prices['Date'] = df_gold_prices['Date'].str.replace('-', ' ')
print(df_gold_prices)

# Graphic
fig = px.line(df_gold_prices, x = dates, y = prices, title = 'Gold Prices over time')
fig.show()

        Date     Price
827  2018-12  1249.887
828  2019-01  1291.630
829  2019-02  1319.755
830  2019-03  1302.286
831  2019-04  1287.650
832  2019-05  1282.460
833  2019-06  1358.488
834  2019-07  1414.611
835  2019-08  1497.102
836  2019-09  1510.336
837  2019-10  1494.765
838  2019-11  1471.921
839  2019-12  1480.025
840  2020-01  1560.668
841  2020-02  1598.818
842  2020-03  1593.764
843  2020-04  1680.030
844  2020-05  1715.697
845  2020-06  1734.032
846  2020-07  1840.807
Max Gold price
1840.807
        Date     Price  buy_price
0    1950 01    34.730    31.2570
1    1950 02    34.730    31.2570
2    1950 03    34.730    31.2570
3    1950 04    34.730    31.2570
4    1950 05    34.730    31.2570
..       ...       ...        ...
842  2020 03  1593.764  1434.3876
843  2020 04  1680.030  1512.0270
844  2020 05  1715.697  1544.1273
845  2020 06  1734.032  1560.6288
846  2020 07  1840.807  1656.7263

[847 rows x 3 columns]
