## Train a Naive Bayes Classifier Model
The model will be trained using pandas and scikit-learn.
The model will be trained from data found at https://www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction

I am running this through VS Code, using a docker container. You may also use the `ipynb` file in other Jupyter Notebook style setups. Consult Jupyter Notebook for options.

**To download the data, run this cell.**

Running this cell will download the data, if you are running it in the docker container. If not, you will need to navigate to the `KAGGLE_DATA_URL` and download the data manually.

In [95]:
import os
from data import download_kaggle_dataset
KAGGLE_DATA_URL = "https://www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction"
DATA_PATH = os.path.join(os.getcwd(), "data", "naive_bayes")
download_kaggle_dataset(KAGGLE_DATA_URL, DATA_PATH)

/workspaces/MS365/src/data/naive_bayes contains data. Delete the file(s) if you want to download again.


**Import the necessary python packages**

Import `pandas`, `sklearn.preprocessing.OneHotEncoder`, `sklearn.preprocessing.LabelEncoder`, `sklearn.preprocessing.OrdinalEncoder`, `sklearn.model_selection.train_test_split`, `sklearn.naive_bayes.MultinomialNB`, `sklearn.metrics.accuracy_score`, `sklearn.metrics.classification_report`, `sklearn.metrics.confusion_matrix`, `sklearn.metrics.ConfusionMatrixDisplay`, `imblearn.over_sampling.RandomOverSampler`, `imblearn.under_sampling.RandomUnderSampler`, and `matplotlib.pyplot`. Typically, packages such as `pandas` and `matplotlib.pyplot` are imported with an allias. I will not be following that strategy here. 

By default, `pandas` will truncate datasets with a lot of rows and a lot of columns. You can alter this functionality with the `set_options()` function. I have set it to show all possible columns. This could result in long run times for cells where you are displaying the data, if there are many columns to display. This will be expected behavior for this analysis.

If you are running the docker container or if you are using [Google Colab](https://colab.research.google.com/), the `pip install` has already been done. If not, then please consult your jupyter notebook environment docs for how to install the needed packages.

In [96]:
import pandas
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, OrdinalEncoder
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, ConfusionMatrixDisplay
from imblearn.over_sampling import RandomOverSampler
from imblearn.under_sampling import RandomUnderSampler
import matplotlib.pyplot
pandas.set_option("display.max_columns", None)

**Import the data for analayis**

There will be two files downloaded from the Kaggle site. The user uploaded a file with training data and a file with testing data. This example will go through the process of splitting the data into a training and testing set. Thus, the two files will need to be joined together, before they are separated again for the Naive Bayes model.

Use `pandas.read_csv` to import both files. The files were named `train.csv` and `test.csv`. The data from these two files will be imported to the variables `train_file` and `test_file`. The two variables can be joined to make a single dataframe using the `concat` method. The first paramter of `concat` is a list of the dataframes to be joined. There are other parameters that can be set. Please consult the method's [documentation](https://pandas.pydata.org/docs/reference/api/pandas.concat.html) for more information.

The concatenated data will be saved to the `df` variable. To view the data, use the `head()` method. The default value is `n=5`. I have used 15 to see the top 15 rows of data.


In [97]:
train_file = pandas.read_csv(os.path.join(DATA_PATH, "train.csv"))
test_file = pandas.read_csv(os.path.join(DATA_PATH, "test.csv"))

df = pandas.concat([train_file, test_file], ignore_index=True)
df.head(n=15)

Unnamed: 0.1,Unnamed: 0,id,Gender,Customer Type,Age,Type of Travel,Class,Flight Distance,Inflight wifi service,Departure/Arrival time convenient,Ease of Online booking,Gate location,Food and drink,Online boarding,Seat comfort,Inflight entertainment,On-board service,Leg room service,Baggage handling,Checkin service,Inflight service,Cleanliness,Departure Delay in Minutes,Arrival Delay in Minutes,satisfaction
0,0,70172,Male,Loyal Customer,13,Personal Travel,Eco Plus,460,3,4,3,1,5,3,5,5,4,3,4,4,5,5,25,18.0,neutral or dissatisfied
1,1,5047,Male,disloyal Customer,25,Business travel,Business,235,3,2,3,3,1,3,1,1,1,5,3,1,4,1,1,6.0,neutral or dissatisfied
2,2,110028,Female,Loyal Customer,26,Business travel,Business,1142,2,2,2,2,5,5,5,5,4,3,4,4,4,5,0,0.0,satisfied
3,3,24026,Female,Loyal Customer,25,Business travel,Business,562,2,5,5,5,2,2,2,2,2,5,3,1,4,2,11,9.0,neutral or dissatisfied
4,4,119299,Male,Loyal Customer,61,Business travel,Business,214,3,3,3,3,4,5,5,3,3,4,4,3,3,3,0,0.0,satisfied
5,5,111157,Female,Loyal Customer,26,Personal Travel,Eco,1180,3,4,2,1,1,2,1,1,3,4,4,4,4,1,0,0.0,neutral or dissatisfied
6,6,82113,Male,Loyal Customer,47,Personal Travel,Eco,1276,2,4,2,3,2,2,2,2,3,3,4,3,5,2,9,23.0,neutral or dissatisfied
7,7,96462,Female,Loyal Customer,52,Business travel,Business,2035,4,3,4,4,5,5,5,5,5,5,5,4,5,4,4,0.0,satisfied
8,8,79485,Female,Loyal Customer,41,Business travel,Business,853,1,2,2,2,4,3,3,1,1,2,1,4,1,2,0,0.0,neutral or dissatisfied
9,9,65725,Male,disloyal Customer,20,Business travel,Eco,1061,3,3,3,4,2,3,3,2,2,3,4,4,3,2,0,0.0,neutral or dissatisfied


**Clean the data**

There are two columns that are not needed for this analysis: `Unnamed: 0` and `id`. Run [`drop()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html) passing the two columns in the `labels` parameter. The use of the other parameters can be found in the documentation.

In [98]:
df.drop(labels=["Unnamed: 0", "id"], axis=1, inplace=True)

Check the dataframe for missing data. The [`count()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.count.html) method will count the number of non-`NULL` values in each of the columns. 

Most of the columns have 129,880 rows of data. The column `Arrival Delay in Minutes` only has 129,487 rows of data. The [`fillna`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html) method can be used to replace the missing values with another value. A common strategy is to use the median, mean, or mode to fill in missing data. This provides data for every row without altering the statistics of the column too much.

Use the `mean()` method on the column to fill in the missing data.

*Hint: Using the `inplace=True` parameter may cause a warning about setting a value on a copy. To avoid that error, assign the values back to the `df["Arrival Delay in Minutes"]` column.*

In [99]:
print(df.count())
df["Arrival Delay in Minutes"] = df["Arrival Delay in Minutes"].fillna(df["Arrival Delay in Minutes"].mean())

Gender                               129880
Customer Type                        129880
Age                                  129880
Type of Travel                       129880
Class                                129880
Flight Distance                      129880
Inflight wifi service                129880
Departure/Arrival time convenient    129880
Ease of Online booking               129880
Gate location                        129880
Food and drink                       129880
Online boarding                      129880
Seat comfort                         129880
Inflight entertainment               129880
On-board service                     129880
Leg room service                     129880
Baggage handling                     129880
Checkin service                      129880
Inflight service                     129880
Cleanliness                          129880
Departure Delay in Minutes           129880
Arrival Delay in Minutes             129487
satisfaction                    

The Naive Bayes models is dependent on categorical data, which is used to calculate probabilities based on the data provided. The data in our dataset will need to be categorical or it will need to be dropped. Reviewing data types will help to decide how to process all the columns. Review the data in `df` shown above and use the `dtypes` attribute on `df` to show all the data types. The `dtypes` attribute shows that there are a lot of numberical data types. But, reviewing the data in the tables, we see that many of those columns contain ordinal categorical data. Thus, many of them can be retained and converted.

In [100]:
df.dtypes

Gender                                object
Customer Type                         object
Age                                    int64
Type of Travel                        object
Class                                 object
Flight Distance                        int64
Inflight wifi service                  int64
Departure/Arrival time convenient      int64
Ease of Online booking                 int64
Gate location                          int64
Food and drink                         int64
Online boarding                        int64
Seat comfort                           int64
Inflight entertainment                 int64
On-board service                       int64
Leg room service                       int64
Baggage handling                       int64
Checkin service                        int64
Inflight service                       int64
Cleanliness                            int64
Departure Delay in Minutes             int64
Arrival Delay in Minutes             float64
satisfacti

The following columns are nominal categorical data. The data type shows that they are `object` data. I will convert these to `category` type. Nominal categorical data does not have a ranked order.
* `Gender`
* `Customer Type`
* `Type of Travel`
* `Class`
* `satisfaction`

The following columns are ordinal categorical data. The data type shown is numerical in nature. The data are ordinal because there is a rank order associated with the value of the number; one being less than five on the scale. Convert these columns to the `category` type as well.
* `Inflight wifi service`
* `Departure/Arrival time convenient`
* `Ease of Online booking`
* `Gate location`
* `Food and drink`
* `Online boarding`
* `Seat comfort`
* `Inflight entertainment`
* `On-board service`
* `Leg room service`
* `Baggage handling`
* `Checkin service`
* `Inflight service`
* `Cleanliness`

In [101]:
df["Gender"] = df["Gender"].astype("category")
df["Customer Type"] = df["Customer Type"].astype("category")
df["Type of Travel"] = df["Type of Travel"].astype("category")
df["Class"] = df["Class"].astype("category")
df["Inflight wifi service"] = df["Inflight wifi service"].astype("category")
df["Departure/Arrival time convenient"] = df["Departure/Arrival time convenient"].astype("category")
df["Ease of Online booking"] = df["Ease of Online booking"].astype("category")
df["Gate location"] = df["Gate location"].astype("category")
df["Food and drink"] = df["Food and drink"].astype("category")
df["Online boarding"] = df["Online boarding"].astype("category")
df["Seat comfort"] = df["Seat comfort"].astype("category")
df["Inflight entertainment"] = df["Inflight entertainment"].astype("category")
df["On-board service"] = df["On-board service"].astype("category")
df["Leg room service"] = df["Leg room service"].astype("category")
df["Baggage handling"] = df["Baggage handling"].astype("category")
df["Checkin service"] = df["Checkin service"].astype("category")
df["Inflight service"] = df["Inflight service"].astype("category")
df["Cleanliness"] = df["Cleanliness"].astype("category")
df["satisfaction"] = df["satisfaction"].astype("category")

Pandas provides a method for converting continuous numerical values into binned categorical data. The method is [`cut()`](https://pandas.pydata.org/docs/reference/api/pandas.cut.html). Review the documentation to get a better understanding of all the options. It will also help you to understand what the output means when the data have been binned.

The `cut()` method requires two pieces of information: 1) the column where the data resides and 2) the bins in which the data should be grouped. This process should be a mostly manual process, where the goal is to evenly group the data into the different bins. The bins should be even so that one group does not over-influence the analysis. As a first pass, one could pass in the column of the data as the `x=` parameter, and a numerical value as the `bins=` value, running a count on the output to see how many values are in each bin. This will provide a starting point. It is improbable that the first pass results in perfectly binned data.

Given that the `Age` column contains ages of people, there are certain assumptions that can be made about the data. First, it is likely that there are very few people over the age of 100. Second, the data is from an airline, the data likely does not include many people under the age of 18. Since the data contains ages of people, it could grouping the data by decade of life could be a good starting point. This is where this analysis will start.

Create a new column in `df` and name it `binned_age`. Use `pandas.cut()`, assigning `df["Age"]` to the `x=`parameter and `10` to the `bins=` parameter; using 10 will separate the data into the different decades of life. Assign the data to the new column. Run a `groupby()` method, grouping the data in the new column and counting the data. This will show how many values of each label there are. The `(` or `)` values represent exclusive values, meaning the value is not included in the group. The `[` or `]` values represent inclusive values, meaning the value is included in the group. The output will show that there is some groups that are larger than others, with 6 to 15 and 69 and older being the smallest groups. The rest of the groups seem to be fairly evenly distributed. 

In [102]:
df["binned_age"] = pandas.cut(x=df["Age"], bins=10)
df["binned_age"].groupby(df["binned_age"]).count()

  df["binned_age"].groupby(df["binned_age"]).count()


binned_age
(6.922, 14.8]     6460
(14.8, 22.6]     12431
(22.6, 30.4]     21989
(30.4, 38.2]     19491
(38.2, 46.0]     24762
(46.0, 53.8]     18732
(53.8, 61.6]     17106
(61.6, 69.4]      7179
(69.4, 77.2]      1499
(77.2, 85.0]       231
Name: binned_age, dtype: int64

The data has been binned based on the value 10, supposedly representing each decade of life, does not seem to be binned based on the decade. The bins contain decimal values, are binned in sections less than 10 years, and do not contain similar counts. This can be solved by specifying the values by which to bin the data. 

It is possible to create bins by which the data will be binned. This can be accomplished by passing in a list of values for each of the bins. Reviewing the data, it seems that the first bin can be between 0 and approximately 20; this will remove the small bin for the people under 18. The binning process will exclude values that equal the left value, unless otherwise specified in the parameters, and will include the values represented by the right value, unless otherwise specified by the parameters. Thus, if the first two values in the list are 0 and 20, the first bin will include all ages greater than 0 and equal to 20. For each of the rest of the bins, I have chosen a 5 year interval, making the above groups smaller, and taking into consideration the fact that the bins were less than 10 years. The numbers will be 0, 20, 25, 30, 35, 40, 45, 50, 55, and 60. After 60, the above groupings were small, so the final number in the list will be 100. Looking at the data will show that the maximum value in the `Age` column is 85, so the last number could be 85, but I have chosen 100. The list of bins has been saved to the `age_bins` variable.

The label used in the column can be specified, or you can allow the process to determine what they should be. I have created a list of labels, so they are better understood. The list of labels has been saved to the `age_labels` variable.

With `age_bins` and `age_labels`, it is now possible to use the pandas `cut()` method again, saving the results to the column `binned_age`. This will overwrite the values that were save there in the previous attempt at binning. If you were to run a similar groupby-count as above, you would see that the groupings are much more similar in size. The values in the column will be the `age_labels` values; the `0-20` label will contain values greater than 0 and equal to 20.

In [103]:
age_bins = [0, 20, 25, 30, 35, 40, 45, 50, 55, 60, 100]
age_labels = ["0-20", "21-25", "26-30", "31-35", "36-40" ,"41-45", "46-50", "51-55", "56-60", "61-"]
df["binned_age"] = pandas.cut(df["Age"], bins=age_bins, labels=age_labels)

Now that the other columns have been converted to `category` and the `Age` column has been used to create `binned_age`, the rest of the columns can be dropped because they contain continuous numerical data. The `Age` column was binned to show how this can be done. The other columns will be dropped for this analysis. They could be important columns for the analysis, so it may be worth it to go through the binning process in a future analysis.

Drop the following columns. They will not be analyzed. Use the `drop()` method, passing in a list of the column names in the `labels=` parameter. Review the [documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html) to better understand the other parameters.
* `Departure Delay in Minutes`
* `Arrival Delay in Minutes`
* `Flight Distance`
* `Age`

In [104]:
df.drop(labels=["Departure Delay in Minutes", "Arrival Delay in Minutes", "Flight Distance", "Age"], axis=1, inplace=True)

The final thing to do to clean up all the data and get it ready to be used by a Naive Bayes model is to one-hot encode the nominal categorical data. These data do not have any hierarchy. The statistical model may try to interpret meaning where it does not exist. To avoid this, the data can be one-hot encoded, which will simply indicate the existence of the data. The columns that need to be one-hot encoded are: `Gender`, `Customer Type`, `Type of Travel`, `Class`, and `binned_age`. The column `satisfaction` will not be one-hot encoded.

Scikit-learn provides a process for one-hot encoding called [`OneHotEncoder`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html). Create a `OneHotEncoder` and assign it to `ohe`. Use the parameters `drop='first'` to drop the first column created and `sparse_output=False` to avoid creating a sparse matrix. Review the documentation regarding the parameter options for more information.

Using the one-hot encoder variable, fit and transform the data using the `fit_transform` method, passing in the columns from `df` that need to be one-hot encoded. The output will be an array of the values one-hot encoded. Save the output to `ohe_df_data`. Create a dataframe from the output data by passing `ohe_df_data` and the `get_feature_names_out()` method into `pandas.DataFrame`. Save this to the `ohe_df` variable. Use the [`merge()`](https://pandas.pydata.org/docs/reference/api/pandas.merge.html) method from pandas to combine `ohe_df` with `df`. The parameters `left_index` and `right_index` indicate that the indexes should be used for joining the data together. Save the joined data to `df`, overwriting the old `df`.

Drop the columns that were used for the new one-hot encoded columns. They will no longer be necessary for this analysis.

In [105]:
ohe = OneHotEncoder(drop='first', sparse_output=False)
ohe_df_data = ohe.fit_transform(df[["Gender", "Customer Type", "Type of Travel", "Class", "binned_age"]])
ohe_df = pandas.DataFrame(ohe_df_data, columns=ohe.get_feature_names_out())
df = pandas.merge(ohe_df, df, left_index=True, right_index=True)
df.drop(columns=["Gender", "Customer Type", "Type of Travel", "Class", "binned_age"], axis=1, inplace=True)

Review the data types and the top rows of the dataframe. This will show that all the original columns are now a category column. The one-hot encoded columns are `float64`, which will still work for this analysis. They could be converted to `category`, but it is unnecessary. Running the `head()` method will also allow for visual inspection of the values in each of the columns. To see the printout from `dtypes` and the `head()` function, while running both lines of code in the same cell, use a `print()` function around the `dtypes` call. Otherwise, the Jupyter notebook will only show the output from the last successful line run in the code block.

In [106]:
print(df.dtypes)
df.head(15)

Gender_Male                           float64
Customer Type_disloyal Customer       float64
Type of Travel_Personal Travel        float64
Class_Eco                             float64
Class_Eco Plus                        float64
binned_age_21-25                      float64
binned_age_26-30                      float64
binned_age_31-35                      float64
binned_age_36-40                      float64
binned_age_41-45                      float64
binned_age_46-50                      float64
binned_age_51-55                      float64
binned_age_56-60                      float64
binned_age_61-                        float64
Inflight wifi service                category
Departure/Arrival time convenient    category
Ease of Online booking               category
Gate location                        category
Food and drink                       category
Online boarding                      category
Seat comfort                         category
Inflight entertainment            

Unnamed: 0,Gender_Male,Customer Type_disloyal Customer,Type of Travel_Personal Travel,Class_Eco,Class_Eco Plus,binned_age_21-25,binned_age_26-30,binned_age_31-35,binned_age_36-40,binned_age_41-45,binned_age_46-50,binned_age_51-55,binned_age_56-60,binned_age_61-,Inflight wifi service,Departure/Arrival time convenient,Ease of Online booking,Gate location,Food and drink,Online boarding,Seat comfort,Inflight entertainment,On-board service,Leg room service,Baggage handling,Checkin service,Inflight service,Cleanliness,satisfaction
0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3,4,3,1,5,3,5,5,4,3,4,4,5,5,neutral or dissatisfied
1,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3,2,3,3,1,3,1,1,1,5,3,1,4,1,neutral or dissatisfied
2,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2,2,2,2,5,5,5,5,4,3,4,4,4,5,satisfied
3,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2,5,5,5,2,2,2,2,2,5,3,1,4,2,neutral or dissatisfied
4,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,3,3,3,3,4,5,5,3,3,4,4,3,3,3,satisfied
5,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3,4,2,1,1,2,1,1,3,4,4,4,4,1,neutral or dissatisfied
6,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2,4,2,3,2,2,2,2,3,3,4,3,5,2,neutral or dissatisfied
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,4,3,4,4,5,5,5,5,5,5,5,4,5,4,satisfied
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1,2,2,2,4,3,3,1,1,2,1,4,1,2,neutral or dissatisfied
9,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3,3,3,4,2,3,3,2,2,3,4,4,3,2,neutral or dissatisfied


**Create X and Y datasets**

The column `satisfaction` will be the dependent variable. Create a variable called `y_column` and assign it the value `satisfaction`. All the other columns will be the independent variables. Create a variable called `x_columns`. Use the `columns` attribute to get all the column names and assign it to the `x_columns` variable. Use the `remove` function to remove the `y_column` value from the `x_columns` values.

Use the column variables to create `x_data` and `y_data` variables filled with the respective data.

In [107]:
y_column = "satisfaction"
x_columns = list(df.columns)
x_columns.remove(y_column)

x_data = df[x_columns]
y_data = df[y_column]

In [108]:
x_data

Unnamed: 0,Gender_Male,Customer Type_disloyal Customer,Type of Travel_Personal Travel,Class_Eco,Class_Eco Plus,binned_age_21-25,binned_age_26-30,binned_age_31-35,binned_age_36-40,binned_age_41-45,binned_age_46-50,binned_age_51-55,binned_age_56-60,binned_age_61-,Inflight wifi service,Departure/Arrival time convenient,Ease of Online booking,Gate location,Food and drink,Online boarding,Seat comfort,Inflight entertainment,On-board service,Leg room service,Baggage handling,Checkin service,Inflight service,Cleanliness
0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3,4,3,1,5,3,5,5,4,3,4,4,5,5
1,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3,2,3,3,1,3,1,1,1,5,3,1,4,1
2,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2,2,2,2,5,5,5,5,4,3,4,4,4,5
3,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2,5,5,5,2,2,2,2,2,5,3,1,4,2
4,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,3,3,3,3,4,5,5,3,3,4,4,3,3,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129875,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,3,3,3,1,4,3,4,4,3,2,4,4,5,4
129876,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4,4,4,4,4,4,4,4,4,5,5,5,5,4
129877,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2,5,1,5,2,1,2,2,4,3,4,5,4,2
129878,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3,3,3,3,4,4,4,4,3,2,5,4,5,4
