# Iris Classification

## Data Generation
Loads Iris data from sklearn

Convert data to pandas dataframe

Create directory for output

Save to a csv file

## Load Data

In [None]:
import os
import pandas as pd
from sklearn.datasets import load_iris

In [None]:
# Load data
data = load_iris()

In [None]:
# Data is stored in sklearn Bunch object
type(data)

In [None]:
# data has a number of items stored in it
# By calling on its keys, the contained info can be accessed
data.keys()

## Explore Bunch Data

We'll focus on 3 features of data:
- data: numpy array of numerical values (150 rows, 4 cols)
- target: numpy array of numerical values indicating each row's class
- feature_names: list of 4 col names

**NOTE:** `.` notation is common in Python, so `data['target']` is equivalent here to `data.target`

In [None]:
# Explore data feature
print(type(data['data']))
print(data['data'].shape)

In [None]:
### # Show data array, optional as array is rather large to display here
# data['data']

In [None]:
# Show target values
data.target

In [None]:
# Check target labels
data.target_names

In [None]:
# List col names
data.feature_names

## Convert to Pandas df

Pandas dfs are one of the most common ways to store, process, and analyze data.

This section converts parts of the bunch object into a df.

In [None]:
# Convert data to pandas dataframe
# data is the numpy array, columns is list of col names
df = pd.DataFrame(data=data.data, columns=data.feature_names)
# Add column with target (iris classification)
df["target"] = data.target

In [None]:
# Calling head on the df shows that top rows of that dataframe
df.head()

In [None]:
# shape returns the number of rows and columns
df.shape

In [None]:
# Check target values and counts
df.target.value_counts()

In [None]:
# Check updated df
df.head()

## Save df to csv file

Creates a data directory and saves df as csv to it

In [None]:
# check current location
os.getcwd()

In [None]:
# Create path up one dir, called data
data_path = os.path.join('..', 'data')
print(data_path)

In [None]:
# Create dir if not already there
os.makedirs(data_path, exist_ok=True)

In [None]:
# Add file name to path
data_file_path = os.path.join(data_path, 'data.csv')
print(data_file_path)

In [None]:
# Save df to path
df.to_csv(data_file_path, index=False)