# Creating the SimpleFrame Class

## Introduction
This project is intended to practice intermediate python skills and object-oriented principles in the creation of a class. The SimpleFrame class will be used with Spotify data to try to answer the questions:

* Which song had the highest number of plays in one day?
* Which song had the lowest number of plays in one day?

## Designing Our Class
SimpleFrame should make it easy for us to **load**, **preview**, **manipulate**, and make **calculations** with our **data**.

To preview our data, we’ll need to:
* Be able to **view** the **first five rows**
* Be able to **view** the **shape of our data**

To manipulate our data, we’ll need to:
* **Add** new columns
* Be able to **apply values** to **columns**
* Be able to **subset** our data

To make calculations, we’ll need to:
* Finding the **minimum**
* Finding the **maximum**
* Finding the **mean**
* Finding the **standard deviation**

## Creating the Class and Attributes

### Translating our definitions into objects
* SimpleFrame -> Class
* Load -> Method
* Data -> Attribute
* Columns -> Attribute

### Preview
* View the first five rows -> method
* View the shape of our data -> method

### Manipulate
* Add new columns > Method
* Apply values to columns > Method
* Subset our data > Method

### Calculations
* Minimum -> Method
* Maximum -> Method
* Mean -> Method
* Standard deviation -> Method

In [6]:
import csv
from statistics import mean, stdev, median, mode

class SimpleFrame():
    def __init__(self,filename):
        self.filename = filename
    
    def read_data(self):
        with open(self.filename,'r') as f:
            file_data = csv.reader(f)
            data_list = list(file_data)
            # separate data and headers
            self.data = data_list[1:]
            self.columns = data_list[0]
    
    # Preview methods
    def head(self):
        return [self.columns] + self.data[:5]
    
    def shape(self):
        num_rows = len(self.data)
        num_columns = len(self.columns)
        return (num_rows, num_columns)
    
    # Manipulate methods
    def new_column(self, column_name):
        self.columns.append(column_name)
        for row in self.data:
            row.append(0)
        
    def apply(self, column_name, new_value):
        column_idx = self.columns.index(column_name)
        for row in self.data:
            row[column_idx] = new_value
            
    def subset(self,column_name, row_value):
        column_idx = self.columns.index(column_name)
        subset_data = []
        for row in self.data:
            if row_value in row:
                subset_data.append(row)
        return subset_data
    
    # Calculations
    def summary_stats(self, column_name):
        column_idx = self.columns.index(column_name)
        column_data = [int(row[column_idx]) for row in self.data]
        print("Summary Stats for column - " + column_name)
        print("----------------------------------------")
        print("The mean is {}".format(mean(column_data)))
        print("The median is {}".format(median(column_data)))
        print("The mode is {}".format(mode(column_data)))
        print("The standard deviation is {}".format(stdev(column_data)))
        
    def maximum(self,column_name):
        column_idx = self.columns.index(column_name)
        return max(self.data,key = lambda x: int(x[column_idx]))
    
    def minimum(self,column_name):
        column_idx = self.columns.index(column_name)
        return min(self.data,key = lambda x: int(x[column_idx]))
    
    def percentile_ranking(self, song_title):
        streams_col_idx = self.columns.index("Streams")
        track_col_idx = self.columns.index("Track Name")
        ordered_data = sorted(self.data, key=lambda x: int(x[streams_col_idx]))
        if song_title in [row[track_col_idx] for row in ordered_data]:
            track_pos = [row[track_col_idx] for row in ordered_data].index(song_title)
        else:
            return "No such song."
        return round(100*(self.shape()[0] - track_pos) / self.shape()[0])

## Making our Calculations

In [7]:
s = SimpleFrame("music_data.csv")
s.read_data()

print("Data Shape:", s.shape(),'\n')
s.new_column('hello')
s.apply('hello',7)
print("Data Preview:",s.head(),'\n')
#print("Data Subset - Shakira:", s.subset("Artist","Shakira"), "\n")

print("Percentile Raking - Chantaje by Shakira:", s.percentile_ranking('Chantaje'),'\n')

s.summary_stats("Streams")

print('\n')
print(s.columns)
print(s.maximum("Streams"))
print(s.minimum("Streams"))

Data Shape: (37100, 6) 

Data Preview: [['', 'Track Name', 'Artist', 'Streams', 'Date', 'Region', 'hello'], ['0', 'Reggaetón Lento (Bailemos)', 'CNCO', '19272', '2017-01-01', 'ec', 7], ['1', 'Chantaje', 'Shakira', '19270', '2017-01-01', 'ec', 7], ['2', 'Otra Vez (feat. J Balvin)', 'Zion & Lennox', '15761', '2017-01-01', 'ec', 7], ['3', "Vente Pa' Ca", 'Ricky Martin', '14954', '2017-01-01', 'ec', 7], ['4', 'Safari', 'J Balvin', '14269', '2017-01-01', 'ec', 7]] 

Percentile Raking - Chantaje by Shakira: 91 

Summary Stats for column - Streams
----------------------------------------
The mean is 6551.852857142857
The median is 4753.5
The mode is 3185
The standard deviation is 4835.224414499119


['', 'Track Name', 'Artist', 'Streams', 'Date', 'Region', 'hello']
['2700', 'Despacito (Featuring Daddy Yankee)', 'Luis Fonsi', '64238', '2017-01-28', 'ec', 7]
['5099', 'Por Fin Te Encontré', 'Cali Y El Dandee', '1993', '2017-02-20', 'ec', 7]


## Results

The song that had the highest number of streams in one day was Despacito by Luis Fonsi with 64238 streams.

The song that had the lowest number of streams in one day was Por Fin Te Encontre by Cali Y El Dandee with 1993.
