# Python for Data Analysis

McKinney, W. (2022). *Python for Data Analysis: Data Wrangling with Pandas, NumPy, and Jupyter.* O’Reilly Media.

Book Materials: https://github.com/wesm/pydata-book <br>
Machine Learning in Python: https://scikit-learn.org/stable/  <br>
Miniconda: https://docs.conda.io/en/latest/ <br>
Conda-Forge: https://conda-forge.org/ <br>

Python Libraries:
* NumPy
* pandas
* matplotlib
* SciPy
* statsmodels

Conventions:
```python
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import statsmodels as sm
```

It's recomended to work in other conda's environment. Using alternative environments make it easier to debug problems and ensure the stability of (base) env. Also, It's recommended the use of **miniforge** instead of Anaconda, that reduces chances of conflicting installations.

Conda commands:
- `conda env list`
- `conda create --name environment_name`
- `conda create --name my_env python=3.8`
- `conda activate environment_name`
- `conda deactivate`
- `conda install <package>`

## Index
* [NumPy Bascs: Arrays and Vectorized Computation](#numpy-bascs-arrays-and-vectorized-computation)
* [Data Types for ndarrays](#data-types-for-ndarrays)
* []


## NumPy Bascs: Arrays and Vectorized Computation

NumPy arrays use much less memory than bult-in Python sequences. Also, NumPy operations are faster than regular Python code.

### NumPy ndarray: Multidimensional Array Object

In [7]:
import numpy as np

data = np.array([[1.5, -0.1, 3], [0, -3, 6.5]])

# You can operate with the values inside the array
data 
print(data*10) 
print(data+data) 

# Checknig 'data' shape and dtype 
print(data.shape) 
print(data.dtype) 


[[ 15.  -1.  30.]
 [  0. -30.  65.]]
[[ 3.  -0.2  6. ]
 [ 0.  -6.  13. ]]
(2, 3)
float64


In [17]:
"""
The array function accepts any sequence-like object.
numpy.array tries to infer a good data type for the array.
"""
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
print(arr1)

# Creating an array 2x4 
data2 = [[1,2,3,4], [5,6,7,8]]
arr2 = np.array(data2)
print(arr2)
print(f"The shape is {arr2.shape}")

# Checking data type
print(f"\nThe data-type of Array1 is {arr1.dtype}")
print(f"The data-type of Array2 is {arr2.dtype}")

# Creating arrays full of ones
print(f"\nArray with 13 'ones': \n{np.ones(13)}")
print(f"Array 3x7 full of 'ones': \n{np.ones((3,7))}")

# Creating new array but empty
print(f"\nArray 3x4x2 empty: \n{np.empty((3,2,2))}")

# Creating a range list array (not list(range(15)) )
print(f"\nArray list of 15 numbers: \n{np.arange(15)}")

[6.  7.5 8.  0.  1. ]
[[1 2 3 4]
 [5 6 7 8]]
The shape is (2, 4)

The data-type of Array1 is float64
The data-type of Array2 is int32

Array with 13 'ones': 
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
Array 3x7 full of 'ones': 
[[1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1.]]

Array 3x4x2 empty: 
[[[9.86556544e-312 3.16202013e-322]
  [0.00000000e+000 0.00000000e+000]]

 [[1.42413554e-306 3.40712186e+175]
  [7.56676808e-067 8.07452209e+169]]

 [[2.61795352e+180 5.50198861e+170]
  [3.29406006e-032 1.29255815e+161]]]

Array list of 15 numbers: 
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]


*Some important NumPy array creation functions (McKinney, 2022):*
![NumPy array creation functions](attachment:image.png)


#### Data types for ndarrays

In [24]:
# Creating arrays and specifying Data type
arr1 = np.array([1.11, 2.22, 3.33], dtype=np.float64)
arr2 = np.array([1, 2, 3], dtype=np.int32)

print(f"Array1 ({arr1}) data type: {arr1.dtype}")
print(f"Array2 ({arr2}) data type: {arr2.dtype}")

# Converting or 'casting' an array from one data tpe to another
flo_arr2 = arr2.astype(np.float64)
int_arr1 = arr1.astype(np.int64)
print(f"\nNow array2 ({flo_arr2}) data type's: {flo_arr2.dtype}")
print(f"Previous ({arr1}) with data type '{int_arr1.dtype}' " 
      f"now is '{int_arr1}', the decimal part has been truncated")

# Numeric string to it's correct numeric form 
num_srt = ["12.2", "13.3", "14"]
int_array = np.arange(10)

arr_num_srt = np.array(num_srt)
arr_num_flo = arr_num_srt.astype(float)

## Taking other array as dtype
arr_num_int = arr_num_flo.astype(int_array.dtype)


print(f"\nThe array '{arr_num_srt}' is '{arr_num_srt.dtype}' "
      f"\nbut we can convert it to '{arr_num_flo} wich is '{arr_num_flo.dtype}"
      f".\nAnd we can use Data type of '{int_array}' ({int_array.dtype}) to "
      f"convert in '{arr_num_int.dtype}' our '{arr_num_int}'.")



Array1 ([1.11 2.22 3.33]) data type: float64
Array2 ([1 2 3]) data type: int32

Now array2 ([1. 2. 3.]) data type's: float64
Previous ([1.11 2.22 3.33]) with data type 'int64' now is '[1 2 3]', the decimal part has been truncated

The array '['12.2' '13.3' '14']' is '<U4' 
but we can convert it to '[12.2 13.3 14. ] wich is 'float64.
And we can use Data type of '[0 1 2 3 4 5 6 7 8 9]' (int32) to convert in 'int32' our '[12 13 14]'.


*NumPy data types (McKinney, 2022):*
|Type|Type code|Description|
|:----|:------:|:---|
|int8, unit8  |  i1, u1  |Signed and unsigned 8-bit (1 byte) integer types|
|int16, uint16  |  i2, u2  | Signed and unsigned 16-bit integer types|
|int32, uint32  |  i4, u4  | Signed and unsigned 32-bit integer types|
|int64, uint64  |  i8, u8  | Signed and unsigned 64-bit integer types|
|float16  |  f2  | Half-precision floating point|
|float32  |  f4 or f  | Standard single-precision floating point; compatible with C float|
|float64  |  f8 or d  | Standard double-precision floating point; compatible with C double and Python float object|
|float128  |  f16 or g  | Extended-precision floating point|
|complex64,  <br> complex128,  <br> complex256  |  c8, c16, c32  | Complex numbers represented by two 32, 64, or 128 floats, respectively|
|bool  |  ?  | Boolean type storing True and False values|
|object  |  O  | Python object type; a value can be any Python objec|
|string_  |  S  |Fixed-length ASCII string type (1 byte per character); for example, to create a string data type with length 10, use 'S10'|
|unicode_  |  U  |Fixed-length Unicode type (number of bytes platform specific); same specification semantics as string_ (e.g., 'U10')|
