### Pandas

* Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
* Ordered and unordered (not necessarily fixed-frequency) time series data.
* Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
* Any other form of observational / statistical data sets. The data need not be labeled at all to be placed into a pandas data structure
* Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data
* Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects
* Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let Series, DataFrame, etc. automatically align the data for you in computations
* Powerful, flexible group by functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data
* Make it easy to convert ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects
* Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
* Intuitive merging and joining data sets
* Flexible reshaping and pivoting of data sets
* Hierarchical labeling of axes (possible to have multiple labels per tick)
* Robust IO tools for loading data from flat files (CSV and delimited), Excel files, databases, and saving / loading data from the ultrafast HDF5 format
* Time series-specific functionality: date range generation and frequency conversion, moving window statistics, date shifting, and lagging.

#### Documentation:

[Package Overview](https://pandas.pydata.org/docs/getting_started/overview.html)

[Reference](https://pandas.pydata.org/docs/reference/io.html#flat-file)

In [1]:
import pandas as pd
import os

OUTPUT_FOLDER = "../resources/output"
WEATHER_FILE = "../resources/weather_data.csv"
STUDENTS_FILE = OUTPUT_FOLDER + "/students_data.csv"

os.makedirs(OUTPUT_FOLDER, exist_ok=True)

weather_df = pd.read_csv(WEATHER_FILE)
print(type(weather_df))
print(type(weather_df["temp"]))

weather_dicts = weather_df.to_dict()
print(f"DF to dictionary of dictionaries: {weather_dicts["temp"]}")
temperature_list = weather_df["temp"].to_list()
print(f"Series to list: {temperature_list}")

print("Data analysis on series:")
print(f"Average temperatures: {weather_df.temp.mean()}")
print(f"Max temperature: {weather_df.temp.max()}")

max_temp_slice = weather_df[weather_df.temp == weather_df.temp.max()]
hotter_days_slice = weather_df[weather_df.temp > weather_df.temp.mean()]
print(f"Slice row with the max temperature:\n {max_temp_slice}")
print(f"List of hotter than average days: {hotter_days_slice.day.to_list()}")

# create dataframe from dictionary:
students_dict = {
    "student": ["Amy", "James", "Angela"],
    "score": [76, 56, 65]
}

students_df = pd.DataFrame(students_dict)
print(students_df)

# loop through each of the rows of a df column:
for (index, row) in students_df.iterrows():
    print(f"{index} - {row.student} - {row.score}")

# create csv from DataFrame:
students_df.to_csv(STUDENTS_FILE)


AttributeError: module 'pandas' has no attribute 'read_csv'