# **PANDS Project**  

### **App Functionality**
The user is presented with a menu in which they can choose various options to interact with the iris dataset.

**Option 1 - Summarise the data:** 
- Allows the user to get the mean, min, max, median and std of each feature.  
- The user can choose between either summary option, or all of them.  
- After the user chooses an option, the summary will be output to a .txt file called summary.txt.  
- If summary.txt already exists, it will be overwritten (maybe add an option to confirm this before doing it)

**Option 2 - Generate Histogram:**   
- Allows the user to get a histogram png of any chosen feature or all features.  
- Potentially add the option of getting a histogram for a chosen species.  
- The png will be saved and named according to the feature chosen (e.g. petal_length_hist.png, sepal_length_hist.png)

**Option 3 - Generate Scatterplot:**   
- The user can choose between two features and have them compared against each other in a scatterplot.  
- Potentially add the option to include a regression line.

### **Main menu (analysis.py)**  
Was tasked with making a menu for the applied databases module, so will be reusing some code I made there here.  
I also just liked being able to navigate through a menu.

In [4]:
#layout of the menu
def show_menu():
    print("\nIris Dataset Analysis\n---------\n")
    print("MENU\n====")
    print("1. Summarise data")
    print("2. Generate Histogram")
    print("3. Generate Scatterplot")
    print("4. [placeholder]")
    print("x. Exit application")

show_menu()


Iris Dataset Analysis
---------

MENU
====
1. Summarise data
2. Generate Histogram
3. Generate Scatterplot
4. [placeholder]
x. Exit application


### **Loading the iris database**  
This was the code I had for loading the iris database (from principles of data analytics module)

In [5]:
from sklearn.datasets import load_iris

#loading iris dataset
iris_data = load_iris()

#variable names
keys = iris_data.keys()
features = iris_data['data']
features_shape = iris_data['data'].shape
target = iris_data['target']
target_shape = iris_data['target'].shape
target_names = iris_data['target_names']
features_names = iris_data['feature_names']

Originally it was laid out like this because all the code was in one jupyter notebook, but for the sake of this project, I'll be structuring everything in seperate files to keep the main analysis.py neat and easier to read. So for example, I'll have a different .py file for each menu option, a .py file for loading iris data and for other functions etc. This also makes it easier to work on the code one at a time and less messy.

In [None]:
#turning the load iris code into a function that can be called from any file
#a dictonary can be used to create the "variables" that can be called
#exluding some variables since they werent used or can be added when its needed
def load_iris_data():
    iris_data = load_iris()
    return {
        "data": iris_data.data,
        "target": iris_data.target,
        "target_names": iris_data.target_names,
        "feature_names": iris_data.feature_names
    }

### **Functions for getting iris information**  
These are functions that I had made for getting information from the iris dataset. I'm not sure yet if I'll use it for this project but I'll include it anyways and find a use for it later.
Originally I had a function for each variable like below:

In [None]:
#functions I used for getting information from iris dataset

def iris_features(data):
    features = data['data']
    print(f"\nFeatures of the data:")
    print(features)

def iris_features_shape(data):
    features_shape = data['data'].shape
    print(f"\nShape of the data:")
    print(features_shape)

def iris_target(data):
    target = data['target']
    print(f"\nTarget of the data:")
    print(target)

def iris_shape(data):
    target_shape = data['target'].shape
    print(f"\nTarget shape of the data:")
    print(target_shape)

def iris_target_names(data):
    target_names = data['target_names']
    print(f"\nTarget names of the data:")
    print(target_names)

def iris_features_names(data):
    features_names = data['feature_names']
    print(f"\nFeature names of the data:")
    print(features_names)

However, the functions were very similar and served the same purpose, so I wanted to try compress functions with similar purpose into one singular function

In [None]:
#Functions for printing information on the iris dataset just for easier reading
#var would be something like data, target, feature_name etc.
def iris_info(name, var):
    print(f"\n{name} of the data:")
    print(var)

Similary with the functions I made for getting the summary of the data (such as mean, min, max, median, std)

In [None]:
import numpy as np

#functions for summarizing the iris dataset
def iris_feature_means(data):
    features = data['data']
    feature_names = data['feature_names']
    print("\nMean of each feature:")
    for r, name in enumerate(feature_names):
        print(f"{name}: {np.mean(features[:, r]):.2f}")

def iris_feature_mins(data):
    features = data['data']
    feature_names = data['feature_names']
    print("\nMinimum of each feature:")
    for r, name in enumerate(feature_names):
        print(f"{name}: {np.min(features[:, r]):.2f}")

def iris_feature_maxs(data):
    features = data['data']
    feature_names = data['feature_names']
    print("\nMaximum of each feature:")
    for r, name in enumerate(feature_names):
        print(f"{name}: {np.max(features[:, r]):.2f}")

def iris_feature_stds(data):
    features = data['data']
    feature_names = data['feature_names']
    print("\nStandard deviation of each feature:")
    for r, name in enumerate(feature_names):
        print(f"{name}: {np.std(features[:, r]):.2f}")

def iris_feature_medians(data):
    features = data['data']
    feature_names = data['feature_names']
    print("\nMedian of each feature:")
    for r, name in enumerate(feature_names):
        print(f"{name}: {np.median(features[:, r]):.2f}")

Compressing it all into one function

In [None]:
#functions for summarizing the iris dataset
def iris_features_summary(features, features_names, stat=""):
    #making a dictiononary to pair each numpy function
    #wanted this to have similar functionality to the iris_info function
    #where one function can be used for a similar purpose by calling different arguments
    stat_function = {

        "mean" : np.mean,
        "min" : np.min,
        "max" : np.max,
        "std" : np.std,
        "median" : np.median

    }

    function = stat_function[stat]
    print(f"\n{stat.capitalize()} of each feature:")
    
    for r, name in enumerate(features_names):
        answer = function(features[:, r])
        print(f"{name}: {answer:.2f}")