# Homework Assignment 2: Data Prep
In this homework assignment, you will continue your exploration of the [SWAN-SF Dataset](https://doi.org/10.7910/DVN/EBCFKM), described in the paper found [here](https://doi.org/10.1038/s41597-020-0548-x).


This assignment will have you explore the cardinalities, number of missing values, detect outliers, handle missing values and outliers, and create data quality report for original and cleaned dataset.

## Step 1: Downloading the Data

This assignment will only be using [Partition 1](https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/EBCFKM/BMXYCB). Recall that in Homework 1, we started to construct the analytics base table for our [SWAN-SF Dataset](https://doi.org/10.7910/DVN/EBCFKM). In that assignment, we read the data from the two subdirectories, __FL__ and __NF__, of the __partition1__ direcotry. These two subdirectories represented the two classes of our target feature in the solar flare prediction problem we will be attempting to solve this semester. We then processed these samples of multivariate time series to construct descriptive features for each sample, and then placed them into our analytics base table.

In this assignment, you will be utilizing a set of extracted descriptive features much like what you were asked to construct in Homework 1. However, this dataset contains many more extracted features than you were asked to compute for Homework 1 (>800), so we need to explore the data to find data quality issues and identify ways to address these issues. Below are links to the full extracted feature dataset for partition 1 and a toy dataset to use for testing you functions.

- [Full Partition 1 feature dataset](http://dmlab.cs.gsu.edu/solar/data/partition1ExtractedFeatures.csv)
- [Toy Partition 1 feature dataset](http://dmlab.cs.gsu.edu/solar/data/toy_partition1ExtractedFeatures.csv)

Now that you have the extracted features csv files, you will load that data into a Pandas DataFrame using the [pandas.read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) method.  


In [4]:
import os
import glob
import pandas as pd
from pandas import DataFrame 
import numpy as np

### Read the dataset

In [5]:
data_dir = r'C:/Users/akash/OneDrive/Desktop/Data Science/hw2'
data_file = r'toy_partition1ExtractedFeatures.csv'

In [6]:
abt = pd.read_csv(os.path.join(data_dir, data_file), index_col=0)

### Q1 (25 points)

Write a function to extract the various pieces of a data quality report, for a specific attribute, and return a dataframe with this information.

 * 'Feature Name': Contains the time series statistical feature name
 
 * 'Cardinality': Contains the count of unique values for the feature
            
 * 'Non-null Count': Contains the number of non-null entries for the feature
            
 * 'Null Count': Contains the number of null or missing entries for the feature
            
 * 'Min': Contains the minimum value of the feature (Without considering the null or nan value)
 
 * '25th': Contains the first quartile (25%) of the feature values (Without considering the null/nan value)
 
 * 'Mean': Contains the mean of the feature values (Without considering the null/nan value)
 
 * '50th': Contains the median of the feature values (Without considering the null/nan value)
            
 * '75th': Contains the third quartile (75%) of the feature values (Without considering the null/nan value)
 
 * 'Max': Contains the maximum value of the feature (Without considering the null/nan value),
            
 * 'Std. Dev': Contains the standard deviation of the feature (Without considering the null/nan value)
 
In addition to the values above, you should identify the number of upper and lower outliers using the $val < Q1-1.5IQR$ and $val > Q3+1.5IQR$ outlier identification method. These added features should be called `Outlier Count Low` and `Outliers Count High` respectively.


 
 Some useful functions for this can be found at:
 
 * [Numpy.percentile](https://numpy.org/doc/stable/reference/generated/numpy.percentile.html)
 
 * [pandas.isna](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.isna.html)
 
 * [Numpy.mean](https://numpy.org/doc/stable/reference/generated/numpy.mean.html)
 
 * [Numpy.std](https://numpy.org/doc/stable/reference/generated/numpy.std.html)
 

In [2]:
def calc_summary_for(feature_name:str, data:DataFrame) -> DataFrame:
    summary_feature_names = ['Feature Name', 'Cardinality', 'Non-null Count', 'Null Count', 'Min', '25th', 'Mean', 
                             '50th', '75th', 'Max', 'Outlier Count Low', 'Outlier Count High' 'Std. Dev']
    IQR = (data[feature_name].quantile(q=.75)) - (data[feature_name].quantile(q=.25))*1.5 
    o_low = 0
    o_high = 0
    for i in data[feature_name].values:
        if i < (data[feature_name].quantile(q=.25) - IQR):
            o_low+= 1
        elif i > (data[feature_name].quantile(q=.75) + IQR):
            o_high+= 1
        else:
            continue

    
    
    df = pd.DataFrame()
    df['Feature Name'] = [feature_name]
    df['Cardinality'] = [data[feature_name].nunique()]
    df['Non-null Count'] = [data[feature_name].notnull().sum()]
    df['Null Count'] = [data[feature_name].isnull().sum()]
    df['Min'] = [data[feature_name].min()]
    df['25th'] = [data[feature_name].quantile(q=.25)]
    df['Mean'] = [data[feature_name].mean()]
    df['50th'] = [data[feature_name].quantile(q=.50)]
    df['75th'] = [data[feature_name].quantile(q=.75)]
    df['Max'] = [data[feature_name].max()]
    df['Outlier Count Low'] = [o_low]
    df['Outlier Count High'] = [o_high]
    df['Std.Dev'] = [data[feature_name].std()]
    df.set_index(['Feature Name'], inplace=True)
    
    print (df)
    return df

    
    
    # Your answer to Q1 goes here! Make sure to return a DataFrame with the features specified. 

NameError: name 'DataFrame' is not defined

In [3]:
import pandas as pd
abt = pd.read_csv(os.path.join(data_dir, data_file), index_col=0)
#results = calc_summary_for(feature_name, data)
print(abt)
summary_TOTUSJH_min = calc_summary_for('TOTUSJH_min', abt)

NameError: name 'data_dir' is not defined

### Q2 (25 points)
Using what you produced to answer Q1, you should now write a function to construct the data quality report for all of the numerical features of our dataset.  You should loop over all of the features in the analytics base table represented by the input feature dataset files from partition 1, with the exception of the first column (this is the index column if you read the file correctly), and the `id`, `lab`, `st`, and `et` columns.  

Your output from this function will be a DataFrame that has 1 row for each feature. 

In [14]:
import pandas as pd
from pandas import DataFrame 
def construct_quality_report(data:DataFrame) -> DataFrame: 
    excluded_columns = ['id', 'lab', 'st', 'et']
    
    summary_feature_names = ['Feature Name', 'Cardinality', 'Non-null Count', 'Null Count', 'Min', '25th', 'Mean', 
                             '50th', '75th', 'Max', 'Outlier Count Low', 'Outlier Count High']
    data.drop(columns = excluded_columns, inplace = True)
    the_list = []
    for feature_name,values in data.iteritems():
        the_list.append(calc_summary_for(feature_name,data))
        
    return pd.concat(the_list)
    
    # Your answer to Q2 goes here! Make sure to return a DataFrame with the features specified.

In [15]:
summary_table = construct_quality_report(abt)

              Cardinality  Non-null Count  Null Count        Min        25th  \
Feature Name                                                                   
TOTUSJH_min            50              50           0  13.292757  188.084597   

                     Mean         50th         75th          Max  \
Feature Name                                                       
TOTUSJH_min   1474.046501  1302.193667  2307.005998  4807.254827   

              Outlier Count Low      Std.Dev  
Feature Name                                  
TOTUSJH_min                   3  1396.206806  
              Cardinality  Non-null Count  Null Count        Min        25th  \
Feature Name                                                                   
TOTUSJH_max            48              50           0  29.150427  254.628298   

                     Mean         50th         75th          Max  \
Feature Name                                                       
TOTUSJH_max   1762.417043  1629.3804

                            Cardinality  Non-null Count  Null Count      Min  \
Feature Name                                                                   
TOTUSJH_difference_of_stds           50              50           0  0.11354   

                                25th       Mean      50th       75th  \
Feature Name                                                           
TOTUSJH_difference_of_stds  2.693618  27.410339  15.13461  38.353624   

                                  Max  Outlier Count Low    Std.Dev  
Feature Name                                                         
TOTUSJH_difference_of_stds  162.75267                  6  34.660943  
                            Cardinality  Non-null Count  Null Count       Min  \
Feature Name                                                                    
TOTUSJH_difference_of_vars           50              50           0  1.920643   

                                 25th         Mean         50th         75th  \
Feature 

                                 Cardinality  Non-null Count  Null Count  \
Feature Name                                                               
TOTUSJH_average_absolute_change           50              50           0   

                                      Min      25th       Mean       50th  \
Feature Name                                                                
TOTUSJH_average_absolute_change  1.363014  5.858084  18.746018  15.262293   

                                      75th         Max  Outlier Count Low  \
Feature Name                                                                
TOTUSJH_average_absolute_change  23.324948  133.718855                  3   

                                   Std.Dev  
Feature Name                                
TOTUSJH_average_absolute_change  24.042701  
                                            Cardinality  Non-null Count  \
Feature Name                                                              
TOTUSJH_average_absolu

              Cardinality  Non-null Count  Null Count           Min  \
Feature Name                                                          
TOTBSQ_var             50              50           0  2.068391e+14   

                      25th          Mean          50th          75th  \
Feature Name                                                           
TOTBSQ_var    3.790878e+16  1.780098e+18  2.370520e+17  1.268338e+18   

                       Max  Outlier Count Low       Std.Dev  
Feature Name                                                 
TOTBSQ_var    2.138121e+19                 10  4.056376e+18  
                 Cardinality  Non-null Count  Null Count       Min      25th  \
Feature Name                                                                   
TOTBSQ_skewness           50              50           0 -1.576371 -0.393113   

                     Mean      50th     75th       Max  Outlier Count Low  \
Feature Name                                                     

                         Cardinality  Non-null Count  Null Count  \
Feature Name                                                       
TOTBSQ_dderivative_mean           50              50           0   

                                  Min          25th          Mean  \
Feature Name                                                        
TOTBSQ_dderivative_mean -5.428790e+07 -2.474726e+06  2.438978e+07   

                                 50th          75th           Max  \
Feature Name                                                        
TOTBSQ_dderivative_mean  2.104384e+06  2.877844e+07  2.239146e+08   

                         Outlier Count Low       Std.Dev  
Feature Name                                              
TOTBSQ_dderivative_mean                 10  6.051613e+07  
                         Cardinality  Non-null Count  Null Count  \
Feature Name                                                       
TOTBSQ_gderivative_mean           50              50           0  

                                           Cardinality  Non-null Count  \
Feature Name                                                             
TOTBSQ_average_absolute_derivative_change           50              50   

                                           Null Count           Min  \
Feature Name                                                          
TOTBSQ_average_absolute_derivative_change           0  7.508714e+06   

                                                   25th          Mean  \
Feature Name                                                            
TOTBSQ_average_absolute_derivative_change  3.024953e+07  9.118937e+07   

                                                   50th          75th  \
Feature Name                                                            
TOTBSQ_average_absolute_derivative_change  7.157534e+07  1.090553e+08   

                                                    Max  Outlier Count Low  \
Feature Name                                 

               Cardinality  Non-null Count  Null Count           Min  \
Feature Name                                                           
TOTPOT_stddev           50              50           0  1.659122e+20   

                       25th          Mean          50th          75th  \
Feature Name                                                            
TOTPOT_stddev  2.128427e+21  1.774737e+22  8.535948e+21  2.106102e+22   

                        Max  Outlier Count Low       Std.Dev  
Feature Name                                                  
TOTPOT_stddev  1.317954e+23                  7  2.730152e+22  
              Cardinality  Non-null Count  Null Count           Min  \
Feature Name                                                          
TOTPOT_var             50              50           0  2.752685e+40   

                      25th          Mean          50th          75th  \
Feature Name                                                           
TOTPOT_var    4.

                              Cardinality  Non-null Count  Null Count  \
Feature Name                                                            
TOTPOT_difference_of_medians           50              50           0   

                                       Min          25th          Mean  \
Feature Name                                                             
TOTPOT_difference_of_medians  1.073626e+20  1.415332e+21  2.704222e+22   

                                      50th          75th           Max  \
Feature Name                                                             
TOTPOT_difference_of_medians  1.175150e+22  2.887074e+22  2.153813e+23   

                              Outlier Count Low       Std.Dev  
Feature Name                                                   
TOTPOT_difference_of_medians                  8  4.650643e+22  
                         Cardinality  Non-null Count  Null Count  \
Feature Name                                                       
TOTPOT

                                           Cardinality  Non-null Count  \
Feature Name                                                             
TOTPOT_average_absolute_derivative_change           50              50   

                                           Null Count           Min  \
Feature Name                                                          
TOTPOT_average_absolute_derivative_change           0  9.573925e+19   

                                                   25th          Mean  \
Feature Name                                                            
TOTPOT_average_absolute_derivative_change  5.386655e+20  4.363283e+21   

                                                   50th          75th  \
Feature Name                                                            
TOTPOT_average_absolute_derivative_change  2.047252e+21  3.214775e+21   

                                                    Max  Outlier Count Low  \
Feature Name                                 

                Cardinality  Non-null Count  Null Count           Min  \
Feature Name                                                            
TOTUSJZ_stddev           50              50           0  7.020140e+10   

                        25th          Mean          50th          75th  \
Feature Name                                                             
TOTUSJZ_stddev  5.511031e+11  1.648601e+12  1.263047e+12  2.115931e+12   

                         Max  Outlier Count Low       Std.Dev  
Feature Name                                                   
TOTUSJZ_stddev  5.374073e+12                  8  1.426944e+12  
              Cardinality  Non-null Count  Null Count           Min  \
Feature Name                                                          
TOTUSJZ_var            50              50           0  4.928236e+21   

                      25th          Mean          50th          75th  \
Feature Name                                                           
TOTUSJZ

TOTUSJZ_dderivative_mean                  8  8.687286e+10  
                          Cardinality  Non-null Count  Null Count  \
Feature Name                                                        
TOTUSJZ_gderivative_mean           50              50           0   

                                   Min          25th          Mean  \
Feature Name                                                         
TOTUSJZ_gderivative_mean -2.076211e+11 -9.814955e+09  3.298510e+10   

                                  50th          75th           Max  \
Feature Name                                                         
TOTUSJZ_gderivative_mean  1.576083e+10  5.708223e+10  3.025742e+11   

                          Outlier Count Low       Std.Dev  
Feature Name                                               
TOTUSJZ_gderivative_mean                  6  8.939238e+10  
                            Cardinality  Non-null Count  Null Count  \
Feature Name                                               

                                        Cardinality  Non-null Count  \
Feature Name                                                          
TOTUSJZ_slope_of_longest_mono_increase           47              50   

                                        Null Count           Min  \
Feature Name                                                       
TOTUSJZ_slope_of_longest_mono_increase           0  4.488248e+10   

                                                25th          Mean  \
Feature Name                                                         
TOTUSJZ_slope_of_longest_mono_increase  2.057419e+11  7.115954e+11   

                                                50th          75th  \
Feature Name                                                         
TOTUSJZ_slope_of_longest_mono_increase  4.967262e+11  8.618932e+11   

                                                 Max  Outlier Count Low  \
Feature Name                                                              
TOTUSJZ_s

                  Cardinality  Non-null Count  Null Count       Min      25th  \
Feature Name                                                                    
ABSNJZH_kurtosis           50              50           0 -1.359906 -0.884573   

                      Mean      50th      75th       Max  Outlier Count Low  \
Feature Name                                                                  
ABSNJZH_kurtosis -0.261597 -0.617953  0.026107  2.859802                  3   

                   Std.Dev  
Feature Name                
ABSNJZH_kurtosis  0.947122  
                           Cardinality  Non-null Count  Null Count  Min  25th  \
Feature Name                                                                    
ABSNJZH_no_zero_crossings            1              50           0  0.0   0.0   

                           Mean  50th  75th  Max  Outlier Count Low  Std.Dev  
Feature Name                                                                  
ABSNJZH_no_zero_crossings   0

ABSNJZH_gderivative_mean                  7  2.000534  
                            Cardinality  Non-null Count  Null Count      Min  \
Feature Name                                                                   
ABSNJZH_dderivative_stddev           50              50           0  1.16647   

                                25th       Mean       50th       75th  \
Feature Name                                                            
ABSNJZH_dderivative_stddev  4.646638  24.278305  12.007622  20.626375   

                                   Max  Outlier Count Low    Std.Dev  
Feature Name                                                          
ABSNJZH_dderivative_stddev  294.618023                  4  54.384961  
                            Cardinality  Non-null Count  Null Count       Min  \
Feature Name                                                                    
ABSNJZH_gderivative_stddev           50              50           0  0.648918   

                          

                                        Cardinality  Non-null Count  \
Feature Name                                                          
ABSNJZH_slope_of_longest_mono_increase           48              50   

                                        Null Count       Min      25th  \
Feature Name                                                             
ABSNJZH_slope_of_longest_mono_increase           0  1.074664  5.663576   

                                             Mean       50th      75th  \
Feature Name                                                             
ABSNJZH_slope_of_longest_mono_increase  31.137994  10.804147  23.49486   

                                               Max  Outlier Count Low  \
Feature Name                                                            
ABSNJZH_slope_of_longest_mono_increase  394.356556                  5   

                                          Std.Dev  
Feature Name                                       
ABSNJZH_slope_of

                           Cardinality  Non-null Count  Null Count  Min  25th  \
Feature Name                                                                    
SAVNCPP_no_zero_crossings            1              50           0  0.0   0.0   

                           Mean  50th  75th  Max  Outlier Count Low  Std.Dev  
Feature Name                                                                  
SAVNCPP_no_zero_crossings   0.0   0.0   0.0  0.0                  0      0.0  
                                 Cardinality  Non-null Count  Null Count  \
Feature Name                                                               
SAVNCPP_mean_local_maxima_value           50              50           0   

                                          Min          25th          Mean  \
Feature Name                                                                
SAVNCPP_mean_local_maxima_value  1.768593e+11  1.254952e+12  1.184655e+13   

                                         50th          75t

                            Cardinality  Non-null Count  Null Count  \
Feature Name                                                          
SAVNCPP_dderivative_stddev           50              50           0   

                                     Min          25th          Mean  \
Feature Name                                                           
SAVNCPP_dderivative_stddev  9.237686e+10  3.938399e+11  1.244685e+12   

                                    50th          75th           Max  \
Feature Name                                                           
SAVNCPP_dderivative_stddev  9.010405e+11  1.448592e+12  8.573704e+12   

                            Outlier Count Low       Std.Dev  
Feature Name                                                 
SAVNCPP_dderivative_stddev                  4  1.577023e+12  
                            Cardinality  Non-null Count  Null Count  \
Feature Name                                                          
SAVNCPP_gderivative_stdd

                    Cardinality  Non-null Count  Null Count           Min  \
Feature Name                                                                
SAVNCPP_last_value           50              50           0  3.137864e+10   

                            25th          Mean          50th          75th  \
Feature Name                                                                 
SAVNCPP_last_value  7.391772e+11  1.128168e+13  5.812096e+12  1.143980e+13   

                             Max  Outlier Count Low       Std.Dev  
Feature Name                                                       
SAVNCPP_last_value  7.938362e+13                  6  1.765043e+13  
                                        Cardinality  Non-null Count  \
Feature Name                                                          
SAVNCPP_slope_of_longest_mono_increase           48              50   

                                        Null Count           Min  \
Feature Name                                   

                 Cardinality  Non-null Count  Null Count       Min      25th  \
Feature Name                                                                   
USFLUX_skewness           50              50           0 -1.140295 -0.433527   

                     Mean      50th      75th       Max  Outlier Count Low  \
Feature Name                                                                 
USFLUX_skewness  0.096311  0.134513  0.549535  1.363673                  0   

                  Std.Dev  
Feature Name               
USFLUX_skewness  0.587358  
                 Cardinality  Non-null Count  Null Count       Min     25th  \
Feature Name                                                                  
USFLUX_kurtosis           50              50           0 -1.758068 -1.14887   

                     Mean      50th      75th       Max  Outlier Count Low  \
Feature Name                                                                 
USFLUX_kurtosis -0.635236 -0.917642 -0.357381 

                         Cardinality  Non-null Count  Null Count  \
Feature Name                                                       
USFLUX_gderivative_mean           50              50           0   

                                  Min          25th          Mean  \
Feature Name                                                        
USFLUX_gderivative_mean -4.038263e+19 -4.114421e+18  1.941553e+19   

                                 50th          75th           Max  \
Feature Name                                                        
USFLUX_gderivative_mean  8.412531e+18  2.940442e+19  2.042578e+20   

                         Outlier Count Low       Std.Dev  
Feature Name                                              
USFLUX_gderivative_mean                  6  4.471197e+19  
                           Cardinality  Non-null Count  Null Count  \
Feature Name                                                         
USFLUX_dderivative_stddev           50              50        

                   Cardinality  Non-null Count  Null Count           Min  \
Feature Name                                                               
USFLUX_last_value           50              50           0  4.282162e+20   

                           25th          Mean          50th          75th  \
Feature Name                                                                
USFLUX_last_value  2.667321e+21  2.059263e+22  1.576004e+22  3.555641e+22   

                            Max  Outlier Count Low       Std.Dev  
Feature Name                                                      
USFLUX_last_value  6.288381e+22                  0  1.872933e+22  
                                       Cardinality  Non-null Count  \
Feature Name                                                         
USFLUX_slope_of_longest_mono_increase           48              50   

                                       Null Count           Min          25th  \
Feature Name                                  

                Cardinality  Non-null Count  Null Count       Min      25th  \
Feature Name                                                                  
TOTFZ_skewness           50              50           0 -1.224638 -0.389203   

                    Mean     50th      75th       Max  Outlier Count Low  \
Feature Name                                                               
TOTFZ_skewness -0.016705 -0.09526  0.312539  1.462223                  2   

                 Std.Dev  
Feature Name              
TOTFZ_skewness  0.577533  
                Cardinality  Non-null Count  Null Count       Min      25th  \
Feature Name                                                                  
TOTFZ_kurtosis           50              50           0 -1.535375 -1.286723   

                    Mean      50th      75th       Max  Outlier Count Low  \
Feature Name                                                                
TOTFZ_kurtosis -0.680406 -1.010417 -0.512138  2.030987      

                        Cardinality  Non-null Count  Null Count           Min  \
Feature Name                                                                    
TOTFZ_dderivative_mean           50              50           0 -8.936954e+22   

                                25th          Mean          50th  \
Feature Name                                                       
TOTFZ_dderivative_mean -1.356294e+22 -2.702962e+21 -3.385273e+21   

                                75th           Max  Outlier Count Low  \
Feature Name                                                            
TOTFZ_dderivative_mean  7.233903e+21  1.120728e+23                  4   

                             Std.Dev  
Feature Name                          
TOTFZ_dderivative_mean  3.142987e+22  
                        Cardinality  Non-null Count  Null Count           Min  \
Feature Name                                                                    
TOTFZ_gderivative_mean           50              50 

                  Cardinality  Non-null Count  Null Count           Min  \
Feature Name                                                              
TOTFZ_last_value           50              50           0 -2.376960e+25   

                          25th          Mean          50th          75th  \
Feature Name                                                               
TOTFZ_last_value -1.179407e+25 -6.387613e+24 -3.513015e+24 -1.221361e+24   

                           Max  Outlier Count Low       Std.Dev  
Feature Name                                                     
TOTFZ_last_value -1.553723e+23                  0  6.779177e+24  
                                      Cardinality  Non-null Count  Null Count  \
Feature Name                                                                    
TOTFZ_slope_of_longest_mono_increase           49              50           0   

                                               Min          25th  \
Feature Name                       

                           Cardinality  Non-null Count  Null Count  Min  25th  \
Feature Name                                                                    
MEANPOT_no_zero_crossings            1              50           0  0.0   0.0   

                           Mean  50th  75th  Max  Outlier Count Low  Std.Dev  
Feature Name                                                                  
MEANPOT_no_zero_crossings   0.0   0.0   0.0  0.0                  0      0.0  
                                 Cardinality  Non-null Count  Null Count  \
Feature Name                                                               
MEANPOT_mean_local_maxima_value           50              50           0   

                                         Min         25th         Mean  \
Feature Name                                                             
MEANPOT_mean_local_maxima_value  1328.299754  2246.121098  7362.334702   

                                        50th          75th         

                            Cardinality  Non-null Count  Null Count  \
Feature Name                                                          
MEANPOT_gderivative_stddev           50              50           0   

                                  Min       25th       Mean       50th  \
Feature Name                                                             
MEANPOT_gderivative_stddev  24.255193  42.713454  93.548193  78.992669   

                                  75th       Max  Outlier Count Low   Std.Dev  
Feature Name                                                                   
MEANPOT_gderivative_stddev  112.604122  348.4994                  7  68.42382  
                              Cardinality  Non-null Count  Null Count  \
Feature Name                                                            
MEANPOT_dderivative_skewness           50              50           0   

                                   Min      25th      Mean     50th      75th  \
Feature Name          

MEANPOT_slope_of_longest_mono_increase  160.843843  
                                        Cardinality  Non-null Count  \
Feature Name                                                          
MEANPOT_slope_of_longest_mono_decrease           48              50   

                                        Null Count        Min      25th  \
Feature Name                                                              
MEANPOT_slope_of_longest_mono_decrease           0 -31.327324 -9.537605   

                                            Mean     50th      75th       Max  \
Feature Name                                                                    
MEANPOT_slope_of_longest_mono_decrease -7.114404 -4.66368 -2.083321 -0.153015   

                                        Outlier Count Low   Std.Dev  
Feature Name                                                         
MEANPOT_slope_of_longest_mono_decrease                  0  7.026737  
                                 Cardinality  Non-nul

                          Cardinality  Non-null Count  Null Count       Min  \
Feature Name                                                                  
EPSZ_difference_of_means           50              50           0  0.000724   

                              25th      Mean      50th      75th       Max  \
Feature Name                                                                 
EPSZ_difference_of_means  0.007424  0.023932  0.017927  0.034909  0.139547   

                          Outlier Count Low  Std.Dev  
Feature Name                                          
EPSZ_difference_of_means                  2  0.02273  
                         Cardinality  Non-null Count  Null Count       Min  \
Feature Name                                                                 
EPSZ_difference_of_stds           50              50           0  0.000011   

                             25th      Mean      50th      75th       Max  \
Feature Name                                      

                                 Cardinality  Non-null Count  Null Count  \
Feature Name                                                               
EPSZ_quadratic_weighted_average           50              50           0   

                                      Min      25th      Mean      50th  \
Feature Name                                                              
EPSZ_quadratic_weighted_average -0.752007 -0.556739 -0.351029 -0.356501   

                                     75th       Max  Outlier Count Low  \
Feature Name                                                             
EPSZ_quadratic_weighted_average -0.158006 -0.013337                  0   

                                  Std.Dev  
Feature Name                               
EPSZ_quadratic_weighted_average  0.210587  
                              Cardinality  Non-null Count  Null Count  \
Feature Name                                                            
EPSZ_average_absolute_change           50   

                Cardinality  Non-null Count  Null Count       Min      25th  \
Feature Name                                                                  
MEANSHR_stddev           50              50           0  0.284178  0.536008   

                    Mean      50th      75th       Max  Outlier Count Low  \
Feature Name                                                                
MEANSHR_stddev  0.896422  0.701996  0.966384  4.021742                  8   

                 Std.Dev  
Feature Name              
MEANSHR_stddev  0.652512  
              Cardinality  Non-null Count  Null Count       Min      25th  \
Feature Name                                                                
MEANSHR_var            50              50           0  0.080757  0.287481   

                  Mean      50th      75th        Max  Outlier Count Low  \
Feature Name                                                               
MEANSHR_var   1.220829  0.492803  0.933902  16.174406            

                          Cardinality  Non-null Count  Null Count       Min  \
Feature Name                                                                  
MEANSHR_gderivative_mean           50              50           0 -0.068383   

                              25th      Mean      50th      75th       Max  \
Feature Name                                                                 
MEANSHR_gderivative_mean -0.027114  0.003349  0.009746  0.025086  0.107995   

                          Outlier Count Low   Std.Dev  
Feature Name                                           
MEANSHR_gderivative_mean                  1  0.037828  
                            Cardinality  Non-null Count  Null Count      Min  \
Feature Name                                                                   
MEANSHR_dderivative_stddev           50              50           0  0.11498   

                                25th      Mean     50th     75th       Max  \
Feature Name                            

                    Cardinality  Non-null Count  Null Count        Min  \
Feature Name                                                             
MEANSHR_last_value           50              50           0  18.177012   

                         25th      Mean       50th      75th        Max  \
Feature Name                                                              
MEANSHR_last_value  25.788104  34.25479  33.413404  42.34929  52.772347   

                    Outlier Count Low   Std.Dev  
Feature Name                                     
MEANSHR_last_value                  7  9.674636  
                                        Cardinality  Non-null Count  \
Feature Name                                                          
MEANSHR_slope_of_longest_mono_increase           49              50   

                                        Null Count       Min      25th  \
Feature Name                                                             
MEANSHR_slope_of_longest_mono_increase 

                                 Cardinality  Non-null Count  Null Count  \
Feature Name                                                               
SHRGT45_mean_local_minima_value           50              50           0   

                                     Min      25th       Mean       50th  \
Feature Name                                                               
SHRGT45_mean_local_minima_value  0.39638  6.533076  24.782581  22.036355   

                                      75th        Max  Outlier Count Low  \
Feature Name                                                               
SHRGT45_mean_local_minima_value  40.236654  58.599288                  0   

                                   Std.Dev  
Feature Name                                
SHRGT45_mean_local_minima_value  18.179785  
                            Cardinality  Non-null Count  Null Count       Min  \
Feature Name                                                                    
SHRGT45_differen

                              Cardinality  Non-null Count  Null Count  \
Feature Name                                                            
SHRGT45_gderivative_kurtosis           50              50           0   

                                   Min      25th      Mean      50th  \
Feature Name                                                           
SHRGT45_gderivative_kurtosis -0.752085 -0.406265  0.496067 -0.042069   

                                  75th       Max  Outlier Count Low  Std.Dev  
Feature Name                                                                  
SHRGT45_gderivative_kurtosis  0.560218  9.358573                  5  1.72316  
                                 Cardinality  Non-null Count  Null Count  \
Feature Name                                                               
SHRGT45_linear_weighted_average           50              50           0   

                                      Min      25th       Mean      50th  \
Feature Name         

                Cardinality  Non-null Count  Null Count        Min       25th  \
Feature Name                                                                    
MEANGAM_median           50              50           0  23.368021  32.810013   

                     Mean       50th       75th        Max  Outlier Count Low  \
Feature Name                                                                    
MEANGAM_median  41.281336  39.878798  50.890278  60.479096                  8   

                 Std.Dev  
Feature Name              
MEANGAM_median  10.52563  
              Cardinality  Non-null Count  Null Count        Min       25th  \
Feature Name                                                                  
MEANGAM_mean           50              50           0  23.441184  32.691972   

                   Mean       50th       75th        Max  Outlier Count Low  \
Feature Name                                                                  
MEANGAM_mean  41.240038  39.426878 

                               Cardinality  Non-null Count  Null Count  \
Feature Name                                                             
MEANGAM_difference_of_medians           50              50           0   

                                    Min     25th      Mean      50th     75th  \
Feature Name                                                                    
MEANGAM_difference_of_medians  0.074061  0.30791  1.290327  0.827612  1.79655   

                                     Max  Outlier Count Low   Std.Dev  
Feature Name                                                           
MEANGAM_difference_of_medians  10.036579                  3  1.564162  
                          Cardinality  Non-null Count  Null Count       Min  \
Feature Name                                                                  
MEANGAM_dderivative_mean           50              50           0 -0.108848   

                             25th     Mean      50th     75th       Max  \
Feat

                                            Cardinality  Non-null Count  \
Feature Name                                                              
MEANGAM_average_absolute_derivative_change           50              50   

                                            Null Count       Min      25th  \
Feature Name                                                                 
MEANGAM_average_absolute_derivative_change           0  0.090454  0.214389   

                                                Mean      50th      75th  \
Feature Name                                                               
MEANGAM_average_absolute_derivative_change  0.461687  0.347995  0.615528   

                                                 Max  Outlier Count Low  \
Feature Name                                                              
MEANGAM_average_absolute_derivative_change  1.712855                  4   

                                             Std.Dev  
Feature Name                 

                  Cardinality  Non-null Count  Null Count       Min     25th  \
Feature Name                                                                   
MEANGBT_kurtosis           50              50           0 -1.471661 -0.92858   

                      Mean      50th     75th       Max  Outlier Count Low  \
Feature Name                                                                 
MEANGBT_kurtosis -0.644765 -0.634976 -0.38421  0.883491                  1   

                   Std.Dev  
Feature Name                
MEANGBT_kurtosis  0.473828  
                           Cardinality  Non-null Count  Null Count  Min  25th  \
Feature Name                                                                    
MEANGBT_no_zero_crossings            1              50           0  0.0   0.0   

                           Mean  50th  75th  Max  Outlier Count Low  Std.Dev  
Feature Name                                                                  
MEANGBT_no_zero_crossings   0.0   0

                              Cardinality  Non-null Count  Null Count  \
Feature Name                                                            
MEANGBT_gderivative_skewness           50              50           0   

                                  Min      25th      Mean      50th      75th  \
Feature Name                                                                    
MEANGBT_gderivative_skewness -2.26175 -0.148926  0.042989  0.017183  0.290371   

                                   Max  Outlier Count Low   Std.Dev  
Feature Name                                                         
MEANGBT_gderivative_skewness  2.656267                  2  0.599849  
                              Cardinality  Non-null Count  Null Count  \
Feature Name                                                            
MEANGBT_dderivative_kurtosis           50              50           0   

                                   Min      25th      Mean      50th  \
Feature Name                      

MEANGBT_avg_mono_decrease_slope  0.511604  
              Cardinality  Non-null Count  Null Count        Min       25th  \
Feature Name                                                                  
MEANGBZ_min            49              50           0  73.465523  90.812687   

                    Mean        50th        75th         Max  \
Feature Name                                                   
MEANGBZ_min   102.109982  100.387771  112.567542  131.806625   

              Outlier Count Low    Std.Dev  
Feature Name                                
MEANGBZ_min                  12  15.543038  
              Cardinality  Non-null Count  Null Count       Min        25th  \
Feature Name                                                                  
MEANGBZ_max            49              50           0  83.34995  101.472778   

                    Mean        50th        75th         Max  \
Feature Name                                                   
MEANGBZ_max   117.272556

MEANGBZ_difference_of_means  17.621579                  3  3.101249  
                            Cardinality  Non-null Count  Null Count      Min  \
Feature Name                                                                   
MEANGBZ_difference_of_stds           50              50           0  0.00238   

                                25th     Mean      50th      75th       Max  \
Feature Name                                                                  
MEANGBZ_difference_of_stds  0.385027  1.19096  0.770988  1.332546  6.568862   

                            Outlier Count Low   Std.Dev  
Feature Name                                             
MEANGBZ_difference_of_stds                  7  1.366334  
                            Cardinality  Non-null Count  Null Count       Min  \
Feature Name                                                                    
MEANGBZ_difference_of_vars           50              50           0  0.016794   

                                2

                                 Cardinality  Non-null Count  Null Count  \
Feature Name                                                               
MEANGBZ_average_absolute_change           50              50           0   

                                      Min      25th      Mean     50th  \
Feature Name                                                             
MEANGBZ_average_absolute_change  0.363575  0.572042  1.190456  0.86439   

                                     75th      Max  Outlier Count Low  \
Feature Name                                                            
MEANGBZ_average_absolute_change  1.775917  3.20305                  5   

                                  Std.Dev  
Feature Name                               
MEANGBZ_average_absolute_change  0.817676  
                                            Cardinality  Non-null Count  \
Feature Name                                                              
MEANGBZ_average_absolute_derivative_change    

              Cardinality  Non-null Count  Null Count      Min      25th  \
Feature Name                                                               
MEANGBH_var            50              50           0  0.33369  1.942555   

                  Mean      50th      75th        Max  Outlier Count Low  \
Feature Name                                                               
MEANGBH_var   8.434149  3.637917  8.716534  65.446713                  8   

                Std.Dev  
Feature Name             
MEANGBH_var   12.277405  
                  Cardinality  Non-null Count  Null Count       Min      25th  \
Feature Name                                                                    
MEANGBH_skewness           50              50           0 -0.860736 -0.277634   

                      Mean      50th      75th       Max  Outlier Count Low  \
Feature Name                                                                  
MEANGBH_skewness  0.097783  0.132953  0.407811  1.242742      

                            Cardinality  Non-null Count  Null Count       Min  \
Feature Name                                                                    
MEANGBH_gderivative_stddev           50              50           0  0.116708   

                                25th      Mean      50th      75th       Max  \
Feature Name                                                                   
MEANGBH_gderivative_stddev  0.325134  0.534685  0.400943  0.722203  1.359556   

                            Outlier Count Low   Std.Dev  
Feature Name                                             
MEANGBH_gderivative_stddev                  5  0.326499  
                              Cardinality  Non-null Count  Null Count  \
Feature Name                                                            
MEANGBH_dderivative_skewness           50              50           0   

                                   Min     25th      Mean      50th      75th  \
Feature Name                            

MEANGBH_avg_mono_increase_slope  0.247951  
                                 Cardinality  Non-null Count  Null Count  \
Feature Name                                                               
MEANGBH_avg_mono_decrease_slope           50              50           0   

                                      Min      25th      Mean      50th  \
Feature Name                                                              
MEANGBH_avg_mono_decrease_slope -1.184749 -0.462565 -0.355773 -0.257915   

                                     75th       Max  Outlier Count Low  \
Feature Name                                                             
MEANGBH_avg_mono_decrease_slope -0.181714 -0.090682                  0   

                                  Std.Dev  
Feature Name                               
MEANGBH_avg_mono_decrease_slope  0.257534  
              Cardinality  Non-null Count  Null Count       Min      25th  \
Feature Name                                                         

                            Cardinality  Non-null Count  Null Count       Min  \
Feature Name                                                                    
MEANJZH_difference_of_stds           50              50           0  0.000003   

                                25th      Mean      50th      75th       Max  \
Feature Name                                                                   
MEANJZH_difference_of_stds  0.000101  0.000565  0.000361  0.000784  0.002724   

                            Outlier Count Low   Std.Dev  
Feature Name                                             
MEANJZH_difference_of_stds                  6  0.000617  
                            Cardinality  Non-null Count  Null Count  \
Feature Name                                                          
MEANJZH_difference_of_vars           50              50           0   

                                     Min          25th      Mean  \
Feature Name                                               

                                            Cardinality  Non-null Count  \
Feature Name                                                              
MEANJZH_average_absolute_derivative_change           50              50   

                                            Null Count       Min      25th  \
Feature Name                                                                 
MEANJZH_average_absolute_derivative_change           0  0.000238  0.000411   

                                                Mean      50th      75th  \
Feature Name                                                               
MEANJZH_average_absolute_derivative_change  0.001179  0.000811  0.001443   

                                                 Max  Outlier Count Low  \
Feature Name                                                              
MEANJZH_average_absolute_derivative_change  0.005103                  7   

                                             Std.Dev  
Feature Name                 

                         Cardinality  Non-null Count  Null Count  Min  25th  \
Feature Name                                                                  
TOTFY_no_zero_crossings            5              50           0  0.0   0.0   

                         Mean  50th  75th   Max  Outlier Count Low   Std.Dev  
Feature Name                                                                  
TOTFY_no_zero_crossings  0.78   0.0   0.0  11.0                  6  2.597408  
                               Cardinality  Non-null Count  Null Count  \
Feature Name                                                             
TOTFY_mean_local_maxima_value           50              50           0   

                                        Min          25th          Mean  \
Feature Name                                                              
TOTFY_mean_local_maxima_value -1.105093e+24 -8.947032e+22  6.726797e+23   

                                       50th          75th           Max  \
F

                          Cardinality  Non-null Count  Null Count  \
Feature Name                                                        
TOTFY_gderivative_stddev           50              50           0   

                                   Min          25th          Mean  \
Feature Name                                                         
TOTFY_gderivative_stddev  1.723711e+21  6.562519e+21  2.423812e+22   

                                  50th          75th           Max  \
Feature Name                                                         
TOTFY_gderivative_stddev  1.629143e+22  2.584198e+22  1.738762e+23   

                          Outlier Count Low       Std.Dev  
Feature Name                                               
TOTFY_gderivative_stddev                  5  3.351509e+22  
                            Cardinality  Non-null Count  Null Count      Min  \
Feature Name                                                                   
TOTFY_dderivative_skewness    

                               Cardinality  Non-null Count  Null Count  \
Feature Name                                                             
TOTFY_avg_mono_increase_slope           50              50           0   

                                        Min          25th          Mean  \
Feature Name                                                              
TOTFY_avg_mono_increase_slope  9.318006e+20  4.200804e+21  1.794023e+22   

                                       50th          75th           Max  \
Feature Name                                                              
TOTFY_avg_mono_increase_slope  1.116076e+22  1.612430e+22  1.775864e+23   

                               Outlier Count Low       Std.Dev  
Feature Name                                                    
TOTFY_avg_mono_increase_slope                  6  3.350357e+22  
                               Cardinality  Non-null Count  Null Count  \
Feature Name                                            

                            Cardinality  Non-null Count  Null Count       Min  \
Feature Name                                                                    
MEANJZD_difference_of_maxs           50              50           0  0.003341   

                                25th      Mean      50th      75th       Max  \
Feature Name                                                                   
MEANJZD_difference_of_maxs  0.022486  0.098935  0.045355  0.106162  0.859243   

                            Outlier Count Low   Std.Dev  
Feature Name                                             
MEANJZD_difference_of_maxs                  7  0.144923  
                             Cardinality  Non-null Count  Null Count  \
Feature Name                                                           
MEANJZD_difference_of_means           50              50           0   

                                  Min      25th      Mean      50th      75th  \
Feature Name                               

                                 Cardinality  Non-null Count  Null Count  \
Feature Name                                                               
MEANJZD_linear_weighted_average           50              50           0   

                                     Min      25th      Mean      50th  \
Feature Name                                                             
MEANJZD_linear_weighted_average -0.90987 -0.004104  0.164788  0.075848   

                                     75th       Max  Outlier Count Low  \
Feature Name                                                             
MEANJZD_linear_weighted_average  0.313344  1.130472                  4   

                                  Std.Dev  
Feature Name                               
MEANJZD_linear_weighted_average  0.319538  
                                    Cardinality  Non-null Count  Null Count  \
Feature Name                                                                  
MEANJZD_quadratic_weighted_average 

                Cardinality  Non-null Count  Null Count       Min      25th  \
Feature Name                                                                  
MEANALP_median           50              50           0 -0.052197 -0.016319   

                   Mean      50th      75th       Max  Outlier Count Low  \
Feature Name                                                               
MEANALP_median  0.00028  0.001915  0.017219  0.079676                  1   

                 Std.Dev  
Feature Name              
MEANALP_median  0.026057  
              Cardinality  Non-null Count  Null Count       Min     25th  \
Feature Name                                                               
MEANALP_mean           50              50           0 -0.052147 -0.01609   

                  Mean     50th      75th       Max  Outlier Count Low  \
Feature Name                                                             
MEANALP_mean  0.000086  0.00155  0.017713  0.079539                  1   



MEANALP_difference_of_vars  0.000018  0.000137                  7  0.000031  
                               Cardinality  Non-null Count  Null Count  \
Feature Name                                                             
MEANALP_difference_of_medians           50              50           0   

                                    Min      25th      Mean      50th  \
Feature Name                                                            
MEANALP_difference_of_medians  0.000104  0.001281  0.004846  0.003848   

                                   75th       Max  Outlier Count Low   Std.Dev  
Feature Name                                                                    
MEANALP_difference_of_medians  0.006618  0.022707                  5  0.004724  
                          Cardinality  Non-null Count  Null Count       Min  \
Feature Name                                                                  
MEANALP_dderivative_mean           50              50           0 -0.000709   

                                        Cardinality  Non-null Count  \
Feature Name                                                          
MEANALP_slope_of_longest_mono_increase           48              50   

                                        Null Count       Min      25th  \
Feature Name                                                             
MEANALP_slope_of_longest_mono_increase           0  0.000406  0.001089   

                                            Mean      50th      75th  \
Feature Name                                                           
MEANALP_slope_of_longest_mono_increase  0.003523  0.002213  0.004916   

                                             Max  Outlier Count Low   Std.Dev  
Feature Name                                                                   
MEANALP_slope_of_longest_mono_increase  0.015072                  5  0.003285  
                                        Cardinality  Non-null Count  \
Feature Name                       

TOTFX_mean_local_maxima_value                  8  3.402001e+24  
                               Cardinality  Non-null Count  Null Count  \
Feature Name                                                             
TOTFX_mean_local_minima_value           50              50           0   

                                        Min          25th          Mean  \
Feature Name                                                              
TOTFX_mean_local_minima_value -6.162757e+24 -1.232459e+24  3.830555e+23   

                                       50th          75th           Max  \
Feature Name                                                              
TOTFX_mean_local_minima_value -7.372461e+22  1.164824e+24  8.289855e+24   

                               Outlier Count Low       Std.Dev  
Feature Name                                                    
TOTFX_mean_local_minima_value                  8  3.290080e+24  
                          Cardinality  Non-null Count  Null Count

TOTFX_dderivative_skewness                  3  0.537672  
                            Cardinality  Non-null Count  Null Count      Min  \
Feature Name                                                                   
TOTFX_gderivative_skewness           50              50           0 -1.91561   

                                25th      Mean      50th      75th       Max  \
Feature Name                                                                   
TOTFX_gderivative_skewness -0.227895 -0.006511 -0.002142  0.291158  0.992703   

                            Outlier Count Low   Std.Dev  
Feature Name                                             
TOTFX_gderivative_skewness                  2  0.496138  
                            Cardinality  Non-null Count  Null Count       Min  \
Feature Name                                                                    
TOTFX_dderivative_kurtosis           50              50           0 -0.896023   

                                25th      

TOTFX_avg_mono_increase_slope                  7  8.012237e+22  
                               Cardinality  Non-null Count  Null Count  \
Feature Name                                                             
TOTFX_avg_mono_decrease_slope           50              50           0   

                                        Min          25th          Mean  \
Feature Name                                                              
TOTFX_avg_mono_decrease_slope -3.975349e+23 -2.338215e+22 -3.099760e+22   

                                       50th          75th           Max  \
Feature Name                                                              
TOTFX_avg_mono_decrease_slope -1.542524e+22 -4.726399e+21 -1.083990e+21   

                               Outlier Count Low       Std.Dev  
Feature Name                                                    
TOTFX_avg_mono_decrease_slope                  0  7.200191e+22  
              Cardinality  Non-null Count  Null Count       Min  

                          Cardinality  Non-null Count  Null Count       Min  \
Feature Name                                                                  
EPSY_difference_of_means           50              50           0  0.000007   

                              25th      Mean      50th      75th       Max  \
Feature Name                                                                 
EPSY_difference_of_means  0.001489  0.006898  0.004427  0.008288  0.026211   

                          Outlier Count Low   Std.Dev  
Feature Name                                           
EPSY_difference_of_means                  9  0.007391  
                         Cardinality  Non-null Count  Null Count       Min  \
Feature Name                                                                 
EPSY_difference_of_stds           50              50           0  0.000064   

                             25th      Mean     50th      75th       Max  \
Feature Name                                    

                              Cardinality  Non-null Count  Null Count  \
Feature Name                                                            
EPSY_average_absolute_change           50              50           0   

                                   Min      25th      Mean      50th  \
Feature Name                                                           
EPSY_average_absolute_change  0.000352  0.000643  0.001881  0.001008   

                                  75th       Max  Outlier Count Low   Std.Dev  
Feature Name                                                                   
EPSY_average_absolute_change  0.002914  0.006166                  4  0.001627  
                                         Cardinality  Non-null Count  \
Feature Name                                                           
EPSY_average_absolute_derivative_change           50              50   

                                         Null Count       Min      25th  \
Feature Name                   

               Cardinality  Non-null Count  Null Count      Min     25th  \
Feature Name                                                               
EPSX_skewness           50              50           0 -1.46143 -0.15346   

                   Mean      50th      75th       Max  Outlier Count Low  \
Feature Name                                                               
EPSX_skewness  0.112465  0.112511  0.335116  1.046957                  3   

                Std.Dev  
Feature Name             
EPSX_skewness  0.464043  
               Cardinality  Non-null Count  Null Count       Min      25th  \
Feature Name                                                                 
EPSX_kurtosis           50              50           0 -1.501509 -1.175366   

                   Mean      50th      75th       Max  Outlier Count Low  \
Feature Name                                                               
EPSX_kurtosis -0.808073 -0.915536 -0.636932  1.739601                  2   



                           Cardinality  Non-null Count  Null Count       Min  \
Feature Name                                                                   
EPSX_dderivative_skewness           50              50           0 -1.129076   

                               25th      Mean     50th      75th       Max  \
Feature Name                                                                 
EPSX_dderivative_skewness -0.111385  0.010812  0.04115  0.211767  0.622376   

                           Outlier Count Low   Std.Dev  
Feature Name                                            
EPSX_dderivative_skewness                  1  0.333726  
                           Cardinality  Non-null Count  Null Count       Min  \
Feature Name                                                                   
EPSX_gderivative_skewness           50              50           0 -0.651365   

                               25th      Mean      50th      75th       Max  \
Feature Name                     

              Cardinality  Non-null Count  Null Count  Min  25th      Mean  \
Feature Name                                                                 
R_VALUE_min            35              50           0  0.0   0.0  2.996308   

                  50th      75th       Max  Outlier Count Low   Std.Dev  
Feature Name                                                             
R_VALUE_min   4.256409  4.700478  5.519059                  0  2.152525  
              Cardinality  Non-null Count  Null Count  Min      25th  \
Feature Name                                                           
R_VALUE_max            46              50           0  0.0  3.143049   

                  Mean      50th      75th       Max  Outlier Count Low  \
Feature Name                                                              
R_VALUE_max   3.857418  4.500024  4.872757  5.628571                  7   

               Std.Dev  
Feature Name            
R_VALUE_max   1.577087  
                Cardinali

                            Cardinality  Non-null Count  Null Count  Min  \
Feature Name                                                               
R_VALUE_difference_of_stds           46              50           0  0.0   

                                25th      Mean      50th      75th       Max  \
Feature Name                                                                   
R_VALUE_difference_of_stds  0.008352  0.109151  0.017795  0.074421  1.090285   

                            Outlier Count Low   Std.Dev  
Feature Name                                             
R_VALUE_difference_of_stds                 10  0.213645  
                            Cardinality  Non-null Count  Null Count  Min  \
Feature Name                                                               
R_VALUE_difference_of_vars           46              50           0  0.0   

                                25th      Mean      50th      75th       Max  \
Feature Name                                   

                                 Cardinality  Non-null Count  Null Count  Min  \
Feature Name                                                                    
R_VALUE_average_absolute_change           46              50           0  0.0   

                                     25th    Mean      50th      75th  \
Feature Name                                                            
R_VALUE_average_absolute_change  0.013259  0.0758  0.020158  0.067188   

                                      Max  Outlier Count Low   Std.Dev  
Feature Name                                                            
R_VALUE_average_absolute_change  0.537999                 10  0.119396  
                                            Cardinality  Non-null Count  \
Feature Name                                                              
R_VALUE_average_absolute_derivative_change           46              50   

                                            Null Count  Min      25th  \
Feature Name      

### Q3 (25 points)
#### Drop features with improper cardinality:
Using the quality report summary table that is returned from the function you wrote for Q2, we are now going to investigate our data. For this, you should use the table returned for the [Full Partition 1 feature dataset](http://dmlab.cs.gsu.edu/solar/data/partition1ExtractedFeatures.csv) and not the toy dataset I provided for testing.

Since we are using real valued features, a majority of them shall have a cardinality close to the sample count. So, for this question, you are to write a function that takes in the summary table and the input dataset DataFrame, and drops the feature that have a cardinality less than 10. This feature should be dropped from both the data quality report summary table and from the actual input dataset DataFrame.

A useful method for this operation is:

* [pandas.DataFrame.drop](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html) (Make sure to use the inplace option otherwise it returns a copy)

In [None]:
def drop_low_card_data(summary_table:DataFrame, data:DataFrame) -> None:
    
    i = summary_table.loc[summary_table['Cardinality'] < 10].index.tolist()
    summary_table.drop(i, inplace = True)
    data.drop(i, axis = 1, inplace = True)
    print(i)
    print(data)
    
    
    # Your answer to Q3 goes here!

In [9]:
drop_low_card_data(summary_table, abt)

['TOTUSJH_no_zero_crossings', 'TOTBSQ_no_zero_crossings', 'TOTPOT_no_zero_crossings', 'TOTUSJZ_no_zero_crossings', 'ABSNJZH_no_zero_crossings', 'SAVNCPP_no_zero_crossings', 'USFLUX_no_zero_crossings', 'TOTFZ_no_zero_crossings', 'MEANPOT_no_zero_crossings', 'EPSZ_no_zero_crossings', 'MEANSHR_no_zero_crossings', 'SHRGT45_no_zero_crossings', 'MEANGAM_no_zero_crossings', 'MEANGBT_no_zero_crossings', 'MEANGBZ_no_zero_crossings', 'MEANGBH_no_zero_crossings', 'TOTFY_no_zero_crossings', 'TOTFX_no_zero_crossings', 'EPSY_no_zero_crossings', 'EPSX_no_zero_crossings', 'R_VALUE_no_zero_crossings']


### Q4 (25 points)
#### Drop features with excessive NaN

Again, using the quality report summary table that is returned from the function you wrote for Q2, we are going to continue investigating our data. For this, you should still be using the table returned for the [Full Partition 1 feature dataset](http://dmlab.cs.gsu.edu/solar/data/partition1ExtractedFeatures.csv) and not the toy dataset I provided for testing.

Like the features that were dropped for Q3, some of the extracted features don't work on all of the variates of the input multi-variate time series samples very well.  So, some of these features return an excessive number of not a number values.  These are not verry useful features, so we want to get rid of them before we continue. To do this, you are to write a function that takes in the summary table and the input dataset DataFrame, and drops the features that have more than 1% of the entries as null/nan values. Again, these features should be dropped from both the data quality report summary table and from the actual input dataset DataFrame.

As in Q3, a useful method for this operation is:

* [pandas.DataFrame.drop](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html) (Make sure to use the inplace option otherwise it returns a copy)

In [82]:
def drop_excessive_nan_data(summary_table:DataFrame, data:DataFrame) -> None:
    empty=""
    the_list[]
    for i in summary_table['Null Count'].to_numpy():
        if ((i * 100) / len(summary_table['Null Count'].to_numpy()) > .01):
            the_list.append(empty)
        else:
            the_list.append(i)
    summary_table['Null Count'] = the_list
    locate = summary_table.loc[summary_table['Null Count'] == empty]
    summary_table.drop(locate, inplace=True)
    data.drop(locate, axis=1, inplace=True)
    print(data)
    # Your answer to Q4 here!

In [83]:
drop_excessive_nan_data(summary_table, abt)

    TOTUSJH_min  TOTUSJH_max  TOTUSJH_median  TOTUSJH_mean  TOTUSJH_stddev  \
0    103.728433   168.447571      125.901867    128.354127       15.077707   
1   3304.799233  4129.596653     3523.029578   3580.248772      190.250948   
2     13.292757    51.719175       29.843361     32.710697       12.338905   
3     94.138307   154.257141      111.495061    116.882144       18.198159   
4     24.938772    51.677508       40.623843     39.183960        6.368641   
5     18.528046    29.150427       23.787601     23.845579        2.364224   
6    203.451625   274.042767      226.589675    231.499210       16.510900   
7     25.429072    36.852150       30.096294     30.344992        2.449060   
8    279.397983   391.691371      353.734269    343.871652       35.560706   
9     51.705487    67.091729       59.893353     59.238280        4.370794   
10   114.733135   168.124143      138.336284    139.713102       11.686581   
11   118.578497   280.738636      214.478707    204.633954      

Now that you have the data cleaned up a little, save the results of both your summary table and your analytics base table using the [pandas.to_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html) method. We will want to use these results for the next homework assignment.

In [89]:
df = pd.DataFrame()
out_dir = r'C:/Users/akash/OneDrive/Desktop/Data Science/hw2'
out_summary_table_name = r'data_quality_table.csv'
out_abt_name = r'cleaned_partition1ExtractedFeatures.csv'
df.to_csv(out_dir,out_summary_table_name)
df.to_csv(out_dir,out_abt_name)

In [86]:
# Save here!