# Pandas DataStructure known as the Series
### What is Pandas?
#### Pandas is a library used a great deal in the "Data Science" community that encapsulates arrays and provides a lot of functionality and  optimization for certain functions.

### Would I use Pandas for everything?
#### Nope.  Machine learning, see 004_sklearn_pandas_linearRegress_opticsMoorningData, likes single dimensional arrays.
#### But I would use Pandas to read, prep, and then marshal data into the structure my machine learning API wants.

### Are there other options?

+ [Dask](https://www.dask.org/):
Provides a Pandas-like API that can handle datasets larger than memory by distributing computations across multiple cores or machines.

+ [Polars](https://pola.rs/):
A newer library with a focus on performance and memory efficiency, making it excellent for large datasets and complex operations.

+ [Xarray](https://docs.xarray.dev/en/stable/): For multi-dimensional labeled arrays, similar to Pandas DataFrames but with additional functionality for handling dimensions.

+ [CUPY](https://cupy.dev/): CuPy is an open-source array library for GPU-accelerated computing with Python. CuPy utilizes CUDA Toolkit libraries including cuBLAS, cuRAND, cuSOLVER, cuSPARSE, cuFFT, cuDNN and NCCL to make full use of the GPU architecture.
The figure shows CuPy speedup over NumPy. Most operations perform well on a GPU using CuPy out of the box. CuPy speeds up some operations more than 100X.

### Setup and Install minimally required libraries

In [None]:
# Import key libraries necessary to support dynamic installation of additional libraries
import sys
# Use subprocess to support running operating system commands from the program, using the "bang" (!)
# symbology is supported, however that does not translate to an actual python script, this is a more
# agnostic approach.
import subprocess
import importlib.util

# Identify the libraries you'd like to add to this Runtime environment.
libraries=["rich", "rich[jupyter]", "unidecode",
           "polars[all]", "dask[complete]", "xarray",]

# Loop through each library and test for existence, if not present install quietly
for library in libraries:
    if library == "Pillow":
      spec = importlib.util.find_spec("PIL")
    else:
      spec = importlib.util.find_spec(library)
    if spec is None:
      print("Installing library " + library)
      subprocess.run(["pip", "install" , library, "--quiet"], check=True)
    else:
      print("Library " + library + " already installed.")

Library rich already installed.
Installing library rich[jupyter]
Library unidecode already installed.
Installing library polars[all]
Installing library dask[complete]
Library xarray already installed.


In [None]:
import numpy as np
import pandas as pd
import polars as pl
import dask as da
import xarray as xr

## Quick Pro-tips

#### References:

+ [Polars Config](https://docs.pola.rs/api/python/stable/reference/config.html)

+ [Pandas Config](https://pandas.pydata.org/docs/user_guide/options.html)

+ [Dask Config](https://docs.dask.org/en/latest/configuration.html)

+ [Xarray Config](https://docs.xarray.dev/en/stable/generated/xarray.set_options.html)

In [None]:
#library configurations examples using Pandas

#show all data returned from the dataset (could be HUGE, be careful)
pd.set_option('display.max_rows', None)
#or
pd.set_option('display.max_rows', 10)

#also note that it gets tiring seeing LOTS of floating points
pd.options.display.float_format = '{:,.4f}'.format

#nump equivalent
np.set_printoptions(precision=4)

## Series

### Pandas Series

One-dimensional ndarray with axis labels (including time series).

Labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as NaN).

Reference: https://pandas.pydata.org/docs/reference/api/pandas.Series.html

### Polars Series



References: https://docs.pola.rs/py-polars/html/reference/series/index.html

In [None]:
#Series is a one-dimensional labeled array capable of holding any data type
series = pd.Series([1,2,3,4,5,'red','green','blue',6,7,8,9])
print(series)

0        1
1        2
2        3
3        4
4        5
      ... 
7     blue
8        6
9        7
10       8
11       9
Length: 12, dtype: object


In [None]:
#If data is an ndarray, index must be the same length as data. If no index is passed, one will be created
series=pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
print(series)
print("-------------------------------------------------------------------")
print(series.index)
print("-------------------------------------------------------------------")
print(series[0])
print("-------------------------------------------------------------------")
print(series[:])

a    1.2373
b   -0.0949
c    0.3676
d   -0.2710
e   -0.0273
dtype: float64
-------------------------------------------------------------------
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
-------------------------------------------------------------------
1.2372780855703776
-------------------------------------------------------------------
a    1.2373
b   -0.0949
c    0.3676
d   -0.2710
e   -0.0273
dtype: float64


In [None]:
#notice that a series can be created from a classic (key=value pair) dictionary
d = {'b': 1, 'a': 0, 'c': 2}
series=pd.Series(d)
print(series)
print(series["b"])

b    1
a    0
c    2
dtype: int64
1


In [None]:
#Series acts very similarly to a ndarray, and is a valid argument to most NumPy functions. However, operations such as slicing will also slice the index.
#If data is an ndarray, index must be the same length as data. If no index is passed, one will be created
series=pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
print("Full array")
print("################################################################################################################")
print(series)
print("################################################################################################################")
print("")
print ("Just the first index")
print("    When directly indexed the 'index' is not included.")
print("################################################################################################################")
print(series[0])
print("")

print(" All values up to element #3")
print("################################################################################################################")
print(series[:3])
print("")

print ("Only those values greater than the median")
print("################################################################################################################")
print(series[series > series.median()])
print("")

print("Integrate with numpy and calculate the exponent, notice Numpy integration")
print("################################################################################################################")
print(np.exp(series))

Full array
################################################################################################################
a   -0.8796
b    0.4979
c    1.1537
d   -1.0412
e    0.5706
dtype: float64
################################################################################################################

Just the first index
    When directly indexed the 'index' is not included.
################################################################################################################
-0.8796269846113157

 All values up to element #3
################################################################################################################
a   -0.8796
b    0.4979
c    1.1537
dtype: float64

Only those values greater than the median
################################################################################################################
c   1.1537
e   0.5706
dtype: float64

Integrate with numpy and calculate the exponent, notice Numpy integration
################

In [None]:
#Series data type operations
print(series.dtype)

float64


In [None]:
#Get the actual array in a series, maybe for direct manipulation
print("Dump the contents of the Series into a single dimensional Numpy array.")
print("###############################################################################################")
print(series.values)
print("")
print("My series dimensions are: ",series.ndim)
print("My series size is:", series.size)
print("My series shpae is:", series.shape)
print("")
print("###############################################################################################")
my_array=series.values
print("My array dimensions are: ",my_array.ndim)
print("My array size is:", my_array.size)
print("My array shape is:", my_array.shape)

print("")
print("###############################################################################################")
#traditional Python for loop
for idx in range(0,my_array.size):
    print(my_array[idx]);

Dump the contents of the Series into a single dimensional Numpy array.
###############################################################################################
[-0.8796  0.4979  1.1537 -1.0412  0.5706]

My series dimensions are:  1
My series size is: 5
My series shpae is: (5,)

###############################################################################################
My array dimensions are:  1
My array size is: 5
My array shape is: (5,)

###############################################################################################
-0.8796269846113157
0.49789547055197225
1.1537373327718028
-1.0411748492131319
0.5706411354419607


In [None]:
#now actually store the series in an xarray
series.to_xarray

<bound method NDFrame.to_xarray of a   -0.8796
b    0.4979
c    1.1537
d   -1.0412
e    0.5706
dtype: float64>

In [None]:
#dictionary type structure example
print("Key 'a' access:",series['a'])
print("")
print("Example of a bad key request for 'z' with a check:", 'z' in series)
print("")
print ("or")
print("")
print ("Key 'z' access with a .get:", series.get('z'))
print("")
print ("or perhaps more elegant")
print("")
print("Key 'z' access with a .get and return for failure:", series.get('z','Not found'))


Key 'a' access: -0.8796269846113157

Example of a bad key request for 'z' with a check: False

or

Key 'z' access with a .get: None

or perhaps more elegant

Key 'z' access with a .get and return for failure: Not found


In [None]:
#vector manipulations
add_series=series+series
print("Series added to itself:\n", add_series)
print("###############################################################################################")
multiply_series=series * 2
print("")
print("Series multiplied by 2:\n", multiply_series)


Series added to itself:
 a   -1.7593
b    0.9958
c    2.3075
d   -2.0823
e    1.1413
dtype: float64
###############################################################################################

Series multiplied by 2:
 a   -1.7593
b    0.9958
c    2.3075
d   -2.0823
e    1.1413
dtype: float64


In [None]:
#Series attribution
print("Name your data")
print("###############################################################################################")
print(series.name)
print("or")
series2 = series.rename("My Example Series")
print(series2.name)

Name your data
###############################################################################################
None
or
My Example Series
