# Converting Rates to Totals

The features of our data are all in rate form, and it occurs to me that it may be useful to utilize the totals for our features, instead of the rates. Here I will convert all features that are in a rate format "feature/60" back to simply "feature".

In [7]:
import pandas as pd
import numpy as np
import os

In [8]:
# Load in our data
filepath = '../../Data/entitiesResolved/merged_data_clean.csv'
data = pd.read_csv(filepath)

Something convenient that I noticed is that TOI is already in the correct format. As opposed to minutes:seconds it has already been converted to total minutes

In [9]:
data['TOI'].head(-5)

0         951.616667
1        1754.250000
2         546.150000
3        1374.483333
4        1212.050000
            ...     
12947       6.350000
12948     876.383333
12949     964.033333
12950       8.683333
12951    1292.750000
Name: TOI, Length: 12952, dtype: float64

The second step is to find all columns that are in a rate format.

In [10]:
filtered_columns = data.filter(like='/60')

The third step is to convert each of these columns back to their totals.

In [11]:
for column in filtered_columns.columns:
    new_col = column[:-3]
    data[new_col] = np.round(data[column] * (data['TOI'] / 60)).astype('int')

In [12]:
# Save the merged data to a csv file
output_dir = '../../Data/entitiesResolved'
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

output_file = output_dir + '/merged_data_final.csv'
data.to_csv(output_file, index=False)