# Graphical output

We want to produce graphs for the data, ths includes data from set_1 and set_2 defined at the end of 'yfin_IQR_outliers.ipynb'. We will use the graphs to see what kinds of analyses could be interesting. 

We want to be considerate about our choice of plots. First, we want to see how the genral descriptives shape out. Histograms and frequency plots in general will help us see how the data is shaped. We saw that there is a high range for scores. We have not yet explored kurtosis. We should first look at how means and other scores are distributed. We should also produce boxplots to visually verify using 'set_2' over 'set_1'. It will also be interesting to compare different groups as well as see what various plots exploring associations look like.

## We will look at
1. Histograms 
    1. Means -> frequencies
    2. Annual means
    3. Range -> frequencies
2. Boxplots
    1. Range data
    2. High and low
3. Graphs to compare data
    1. Variations annually
    2. High and low
    3. Range and high
    4. Range and low
    5. Open and close

## Histograms

To make histograms with mean frequencies we will need to segment the data into groups such that we means that fall within certain ranges. We saw in our descriptives that there are scores ranging from just over 1 USD to 6000 USD. We need an appropriate interval to parse the data for frequencies. Arguably, groups of 10 are probably too narrow and may not be that much visually informative than interval groups parsed at 50, which may also not be helpful. 

We will therefore start with table to see what the frequencies are to make a decision about what intervals we need. Ultimately we want a table/dataframe that looks something like this:

| Mean Interval | Interval 1 | Interval 2 | 
|---------------|-------|-------|
| Set 1 | Count | Count | 
| Set 2 | Count | Count | 


In [2]:
# Import modules and variables we need 

## Libraries
import numpy as np
import matplotlib as plt
import pandas as pd

## Original data file
yfin_csv = pd.read_csv(r'https://raw.githubusercontent.com/alexcrockett/Jupyter-Playground/personal/yfin_dataset/02-data/stock_details_5_years.csv')

## Past work
import yfin_descriptives_py
import yfin_group_range_py
import yfin_IQR_outliers_py

## Data
from yfin_descriptives_py import di_mean1, di_median1, di_max_min_range, di_var1, di_std1 # import disctionaries
from yfin_descriptives_py import x_1, x_2, x_3, x_4, x_5, x_6, x_7 # import arrays
from yfin_group_range_py import company_averages # Averages for each company each day
from yfin_group_range_py import company_ranges_mean # Mean range per company
from yfin_IQR_outliers_py import set_1, set_2 # so we can call our dataframes more easily
from yfin_IQR_outliers_py import median # median scores
from yfin_IQR_outliers_py import first_quartile, third_quartile # First and third quartiles
from yfin_IQR_outliers_py import yfin_restricted_set

                        Date        Open  ...  Stock Splits  Company
0  2018-11-29 00:00:00-05:00   43.829761  ...           0.0     AAPL
1  2018-11-29 00:00:00-05:00  104.769074  ...           0.0     MSFT
2  2018-11-29 00:00:00-05:00   54.176498  ...           0.0    GOOGL
3  2018-11-29 00:00:00-05:00   83.749496  ...           0.0     AMZN
4  2018-11-29 00:00:00-05:00   39.692784  ...           0.0     NVDA

[5 rows x 9 columns]


In [7]:
# Defining initial variables and finding the overall count

## Create Means lists

set_1 = yfin_csv.copy() # A copy of the original data
set_2 = yfin_restricted_set.copy() # A copy of the data without outliers

set_1['set_1_means'] = (yfin_csv['High'] + yfin_csv['Low']) / 2 # Here we are defining a new column called set_1_means
set_2['set_2_means'] = (set_2['High'] + set_2['Low']) / 2 # Here we are defining a new column called set_2_means


In [8]:
# Creating the dataframe

## First let's get counts and a max for all
mean_freq_dict = {
    "Count": [set_1['set_1_means'].count(), set_2['set_2_means'].count()],
    "Minimum": [set_1['set_1_means'].min(), set_2['set_2_means'].min()],
    "Maximum": [set_1['set_1_means'].max(), set_2['set_2_means'].max()]
}

mean_freq_frame = pd.DataFrame(mean_freq_dict)

print(mean_freq_frame)

    Count   Minimum      Maximum
0  602962  1.043654  6459.459961
1  558932  1.043654   615.128419
