![image info](https://raw.githubusercontent.com/albahnsen/MIAD_ML_and_NLP/main/images/banner_1.png)

# Proyecto 2 - Clasificación de género de películas

El propósito de este proyecto es que puedan poner en práctica, en sus respectivos grupos de trabajo, sus conocimientos sobre técnicas de preprocesamiento, modelos predictivos de NLP, y la disponibilización de modelos. Para su desarrollo tengan en cuenta las instrucciones dadas en la "Guía del proyecto 2: Clasificación de género de películas"

**Entrega**: La entrega del proyecto deberán realizarla durante la semana 8. Sin embargo, es importante que avancen en la semana 7 en el modelado del problema y en parte del informe, tal y como se les indicó en la guía.

Para hacer la entrega, deberán adjuntar el informe autocontenido en PDF a la actividad de entrega del proyecto que encontrarán en la semana 8, y subir el archivo de predicciones a la [competencia de Kaggle](https://www.kaggle.com/t/2c54d005f76747fe83f77fbf8b3ec232).

## Datos para la predicción de género en películas

![image info](https://raw.githubusercontent.com/albahnsen/MIAD_ML_and_NLP/main/images/moviegenre.png)

En este proyecto se usará un conjunto de datos de géneros de películas. Cada observación contiene el título de una película, su año de lanzamiento, la sinopsis o plot de la película (resumen de la trama) y los géneros a los que pertenece (una película puede pertenercer a más de un género). Por ejemplo:
- Título: 'How to Be a Serial Killer'
- Plot: 'A serial killer decides to teach the secrets of his satisfying career to a video store clerk.'
- Generos: 'Comedy', 'Crime', 'Horror'

La idea es que usen estos datos para predecir la probabilidad de que una película pertenezca, dada la sinopsis, a cada uno de los géneros.

Agradecemos al profesor Fabio González, Ph.D. y a su alumno John Arevalo por proporcionar este conjunto de datos. Ver https://arxiv.org/abs/1702.01992

## Ejemplo predicción conjunto de test para envío a Kaggle
En esta sección encontrarán el formato en el que deben guardar los resultados de la predicción para que puedan subirlos a la competencia en Kaggle.

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [10]:
# Importación librerías
import pandas as pd
import os
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.metrics import r2_score, roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.model_selection import RandomizedSearchCV

import pickle
import joblib

#from keras.preprocessing import sequence
from nltk.stem import WordNetLemmatizer
from nltk.stem.snowball import SnowballStemmer
import nltk
import math

#nltk.download('omw')


from xgboost import XGBClassifier
import matplotlib.pyplot as plt

#from keras.models import Sequential
#from keras.layers import Dense
#from keras.layers import Dropout

#from keras.activations import relu, swish, sigmoid
#from sentence_transformers import SentenceTransformer

In [5]:
!pip uninstall keras

In [7]:
# Carga de datos de archivo .csv
dataTraining = pd.read_csv('https://github.com/albahnsen/MIAD_ML_and_NLP/raw/main/datasets/dataTraining.zip', encoding='UTF-8', index_col=0)
dataTesting = pd.read_csv('https://github.com/albahnsen/MIAD_ML_and_NLP/raw/main/datasets/dataTesting.zip', encoding='UTF-8', index_col=0)

In [8]:
# Visualización datos de entrenamiento
dataTraining.head()

Unnamed: 0,year,title,plot,genres,rating
3107,2003,Most,most is the story of a single father who takes...,"['Short', 'Drama']",8.0
900,2008,How to Be a Serial Killer,a serial killer decides to teach the secrets o...,"['Comedy', 'Crime', 'Horror']",5.6
6724,1941,A Woman's Face,"in sweden , a female blackmailer with a disfi...","['Drama', 'Film-Noir', 'Thriller']",7.2
4704,1954,Executive Suite,"in a friday afternoon in new york , the presi...",['Drama'],7.4
2582,1990,Narrow Margin,"in los angeles , the editor of a publishing h...","['Action', 'Crime', 'Thriller']",6.6


In [9]:
# Visualización datos de test
dataTesting.head()

Unnamed: 0,year,title,plot
1,1999,Message in a Bottle,"who meets by fate , shall be sealed by fate ...."
4,1978,Midnight Express,"the true story of billy hayes , an american c..."
5,1996,Primal Fear,martin vail left the chicago da ' s office to ...
6,1950,Crisis,husband and wife americans dr . eugene and mr...
7,1959,The Tingler,the coroner and scientist dr . warren chapin ...


In [10]:
#Pasar palabras a minusculas
def minusculizar(df):
    df2 = df.copy()
    df2['title'] = df2['title'].apply(lambda x : x.lower())
    df2['plot'] = df2['plot'].apply(lambda x : x.lower())
    return df2

dataTraining_min  = minusculizar(dataTraining)
dataTraining_min

Unnamed: 0,year,title,plot,genres,rating
3107,2003,most,most is the story of a single father who takes...,"['Short', 'Drama']",8.0
900,2008,how to be a serial killer,a serial killer decides to teach the secrets o...,"['Comedy', 'Crime', 'Horror']",5.6
6724,1941,a woman's face,"in sweden , a female blackmailer with a disfi...","['Drama', 'Film-Noir', 'Thriller']",7.2
4704,1954,executive suite,"in a friday afternoon in new york , the presi...",['Drama'],7.4
2582,1990,narrow margin,"in los angeles , the editor of a publishing h...","['Action', 'Crime', 'Thriller']",6.6
...,...,...,...,...,...
8417,2010,our family wedding,""" our marriage , their wedding . "" it ' s l...","['Comedy', 'Romance']",4.9
1592,1984,conan the destroyer,"the wandering barbarian , conan , alongside ...","['Action', 'Adventure', 'Fantasy']",5.8
1723,1955,kismet,"like a tale spun by scheherazade , kismet fol...","['Adventure', 'Musical', 'Fantasy', 'Comedy', ...",6.4
7605,1982,the secret of nimh,"mrs . brisby , a widowed mouse , lives in a...","['Animation', 'Adventure', 'Drama', 'Family', ...",7.6


Plots

In [11]:
Listadeplots = dataTraining_min['plot'].tolist()
lista_palabras=[]
for i in Listadeplots:
    lista_palabras.extend(i.split())
palabras_unicas_plot = list(set(lista_palabras))

stopwords_english = nltk.corpus.stopwords.words('english')

lista_palabras_completa_filtrada = [palabra for palabra in lista_palabras if palabra.lower() not in stopwords_english]
lista_filtrada_plot = set(lista_palabras_completa_filtrada)

len(palabras_unicas_plot),len(lista_filtrada_plot)

(38734, 38584)

Creacion del diccionario de lematización

In [12]:
# Inicializar el lematizador
lemmatizer = WordNetLemmatizer()

# Mapear las etiquetas POS de NLTK a las etiquetas POS de WordNet
tag_map = {
    'N': nltk.corpus.wordnet.NOUN,
    'V': nltk.corpus.wordnet.VERB,
    'R': nltk.corpus.wordnet.ADV,
    'J': nltk.corpus.wordnet.ADJ
}

# Lematizar la lista de palabras
lista_lematizada = []
for palabra in lista_filtrada_plot:
    # Obtener la etiqueta POS de cada palabra
    pos_tag = nltk.pos_tag([palabra])[0][1][0].upper()
    # Mapear la etiqueta POS a las etiquetas POS de WordNet
    pos_tag = tag_map.get(pos_tag, nltk.corpus.wordnet.NOUN)
    
    # Lematizar la palabra
    lema = lemmatizer.lemmatize(palabra, pos=pos_tag)
    lista_lematizada.append(lema)

# Imprimir la lista de palabras lematizada
print(lista_lematizada)

diccionario_original_a_lemas = {clave: valor for clave, valor in zip(lista_filtrada_plot, lista_lematizada)}


lista_unicos_lematizada = list(set(lista_lematizada))
len(lista_unicos_lematizada)



31371

Palabras lematizadas más comunes

In [13]:
df_Listado_completo_filtrado_no_stopwords = pd.DataFrame(lista_palabras_completa_filtrada,columns=['words'])
df_Listado_completo_filtrado_no_stopwords['Lematized'] = df_Listado_completo_filtrado_no_stopwords['words'].apply(lambda x : diccionario_original_a_lemas[x])
df_Listado_completo_filtrado_no_stopwords['Count'] = 1
Repeticiones_por_lematized = df_Listado_completo_filtrado_no_stopwords[['Lematized','Count']].groupby('Lematized').sum().reset_index().sort_values('Count',ascending=False)
filtro = Repeticiones_por_lematized['Lematized'].apply(lambda x: len(x) > 1)
resultados_filtrados = Repeticiones_por_lematized[filtro].reset_index(drop=True)
Total_palabras = resultados_filtrados['Count'].sum()
resultados_filtrados['Part'] = resultados_filtrados['Count'].apply(lambda x: x/Total_palabras)
resultados_filtrados['Acumulado'] = resultados_filtrados['Part'].cumsum()
Porcentaje = 0.9
pareto = resultados_filtrados[resultados_filtrados['Acumulado']<=Porcentaje].shape
print('El '+str(Porcentaje*100) + '% de las palabras se encuentra en las primeras ' + str(pareto[0]))

Palabras_Para_Clasificar_Lematizadas = resultados_filtrados[resultados_filtrados['Acumulado']<=Porcentaje]['Lematized'].tolist()

vocabulario_lematizado = {palabra: indice for indice, palabra in enumerate(Palabras_Para_Clasificar_Lematizadas)}

resultados_filtrados.head(10)

El 90.0% de las palabras se encuentra en las primeras 7558


Unnamed: 0,Lematized,Count,Part,Acumulado
0,life,3640,0.006784,0.006784
1,one,3068,0.005718,0.012502
2,get,3068,0.005718,0.018219
3,find,2875,0.005358,0.023577
4,go,2405,0.004482,0.028059
5,new,2271,0.004232,0.032292
6,take,2240,0.004175,0.036466
7,friend,2190,0.004081,0.040548
8,year,2085,0.003886,0.044434
9,make,2057,0.003834,0.048267


Variable de interes

In [14]:
# Definición de variable de interés (y)
dataTraining['genres'] = dataTraining['genres'].map(lambda x: eval(x))
le = MultiLabelBinarizer()
y_genres = le.fit_transform(dataTraining['genres'])
y_genres

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 1, 0, 0],
       ...,
       [0, 1, 0, ..., 0, 0, 0],
       [0, 1, 1, ..., 0, 0, 0],
       [0, 1, 1, ..., 0, 0, 0]])

Lematizar texto original y eliminar stopwords

In [15]:
def lematizar_texto(texto, diccionario):
    palabras = texto.split()  # Dividir el texto en palabras
    palabras_lematizadas = [diccionario.get(palabra, palabra) for palabra in palabras]  # Obtener las palabras lematizadas del diccionario
    palabras_filtradas = [palabra for palabra in palabras_lematizadas if palabra.lower() not in stopwords_english]  # Filtrar las stopwords
    texto_lematizado = ' '.join(palabras_filtradas)  # Unir las palabras lematizadas en un nuevo texto
    return texto_lematizado

dataTraining_min['plot_lematized'] = dataTraining_min['plot'].apply(lambda x: lematizar_texto(x,diccionario_original_a_lemas))
dataTraining_min.head()


Unnamed: 0,year,title,plot,genres,rating,plot_lematized
3107,2003,most,most is the story of a single father who takes...,"['Short', 'Drama']",8.0,story single father take eight year - old son ...
900,2008,how to be a serial killer,a serial killer decides to teach the secrets o...,"['Comedy', 'Crime', 'Horror']",5.6,serial killer decides teach secret satisfy car...
6724,1941,a woman's face,"in sweden , a female blackmailer with a disfi...","['Drama', 'Film-Noir', 'Thriller']",7.2,"sweden , female blackmailer disfigure facial s..."
4704,1954,executive suite,"in a friday afternoon in new york , the presi...",['Drama'],7.4,"friday afternoon new york , president tredway ..."
2582,1990,narrow margin,"in los angeles , the editor of a publishing h...","['Action', 'Crime', 'Thriller']",6.6,"los angeles , editor publishing house carol hu..."


Conteo de palabras por cada genero

In [16]:
cols = ['p_Action', 'p_Adventure', 'p_Animation', 'p_Biography', 'p_Comedy', 'p_Crime', 'p_Documentary', 'p_Drama', 'p_Family',
        'p_Fantasy', 'p_Film-Noir', 'p_History', 'p_Horror', 'p_Music', 'p_Musical', 'p_Mystery', 'p_News', 'p_Romance',
        'p_Sci-Fi', 'p_Short', 'p_Sport', 'p_Thriller', 'p_War', 'p_Western']

dataTraining_plot = dataTraining_min['plot_lematized'].reset_index(drop=True)

df_y_genres = pd.DataFrame(y_genres,columns=cols)
df_a_evaluar = pd.concat((dataTraining_plot,df_y_genres),axis=1)

participaciones = pd.DataFrame(index=vocabulario_lematizado, columns=df_a_evaluar.columns[1:])
participaciones
contador=1
actual = 0.05
for palabra in vocabulario_lematizado:
    # Calcular la participación de la palabra en cada género
    completado= contador/len(vocabulario_lematizado)
    if completado > actual:
        print('% Completado: ' + str(round(actual*100,0)))
        actual += 0.05

    for genero in df_a_evaluar.columns[1:]:
        total_words = df_a_evaluar[df_a_evaluar[genero] == 1]['plot_lematized'].apply(len).sum()
        # Calcular la frecuencia de aparición de la palabra en el género
        freq = df_a_evaluar[df_a_evaluar[genero] == 1]['plot_lematized'].str.count(palabra).sum()
        # Almacenar la participación en el DataFrame
        participaciones.at[palabra, genero] = freq
    contador +=1 
participaciones.head()

joblib.dump(participaciones,'Participaciones2.pkl', compress=3)

% Completado: 5.0
% Completado: 10.0
% Completado: 15.0
% Completado: 20.0
% Completado: 25.0
% Completado: 30.0
% Completado: 35.0
% Completado: 40.0
% Completado: 45.0
% Completado: 50.0
% Completado: 55.0
% Completado: 60.0
% Completado: 65.0
% Completado: 70.0
% Completado: 75.0
% Completado: 80.0
% Completado: 85.0
% Completado: 90.0
% Completado: 95.0


['Participaciones2.pkl']

Conteo ponderado

In [17]:
suma_filas = participaciones.sum(axis=1)
participaciones_por_fila = participaciones.div(suma_filas, axis=0)
participaciones_por_fila.head()

Unnamed: 0,p_Action,p_Adventure,p_Animation,p_Biography,p_Comedy,p_Crime,p_Documentary,p_Drama,p_Family,p_Fantasy,...,p_Musical,p_Mystery,p_News,p_Romance,p_Sci-Fi,p_Short,p_Sport,p_Thriller,p_War,p_Western
life,0.039037,0.035613,0.009197,0.023481,0.152725,0.055865,0.016926,0.247627,0.027395,0.034439,...,0.010664,0.028079,0.000391,0.126113,0.024264,0.001272,0.012621,0.072987,0.013404,0.00724
one,0.066641,0.050104,0.008896,0.014846,0.131754,0.075483,0.010479,0.187698,0.025925,0.031001,...,0.012171,0.042517,0.000164,0.08891,0.032256,0.00262,0.011243,0.10512,0.017465,0.011898
get,0.063739,0.046315,0.010045,0.007912,0.17886,0.076007,0.005512,0.177882,0.034314,0.028536,...,0.013868,0.031025,0.0,0.109965,0.025336,0.002845,0.012179,0.09183,0.014046,0.009956
find,0.062366,0.057486,0.014639,0.009284,0.131516,0.067841,0.003809,0.163413,0.03761,0.040109,...,0.012378,0.051892,0.0,0.084742,0.041895,0.001904,0.007498,0.107593,0.012973,0.00976
go,0.07375,0.059133,0.011778,0.012563,0.14309,0.070669,0.009181,0.172203,0.035335,0.039865,...,0.011718,0.032737,0.000302,0.084803,0.034972,0.002718,0.012745,0.097065,0.012986,0.012443


Calculo de tf_IDF

In [18]:
joblib.dump(participaciones,'Participaciones.pkl', compress=3)

tf_idf_values = participaciones * 0
# Número total de géneros
total_genres = len(df_a_evaluar.columns) - 1

# Calcular la frecuencia de aparición de cada palabra en cada género
for genre in df_a_evaluar.columns[1:]:
    total_words = df_a_evaluar[df_a_evaluar[genre] == 1]['plot_lematized'].apply(len).sum()
    for word in participaciones.index:
        freq = participaciones.loc[word, genre]
        tf = freq / total_words
        num_genres_with_word = participaciones.loc[word].gt(0).sum()  # Número de géneros en los que aparece la palabra
        idf = math.log(total_genres / num_genres_with_word)
        tf_idf = tf * idf        
        tf_idf_values.loc[word, genre] = tf_idf

Valores_Tf_Idf_Mas_Altos = tf_idf_values.sum(axis=1).sort_values(ascending=False)
Valores_Tf_Idf_Mas_Altos.head(10)

rd      0.001688
ant     0.001321
pi      0.001206
pooh    0.000944
get     0.000942
lt      0.000830
fu      0.000797
inc     0.000756
gu      0.000743
find    0.000739
dtype: float64

Analisis por métrica de TF_IDF por palabras

In [35]:
TF_IDF_Mas_Alto_Por_Genero = {}
Listado_palabras_unicas_por_Tf_Idf = []
Nuevo_vocabulario_por_Tf_Idf  = {}
max_palabras = 500
for i in tf_idf_values.columns:
    Words_selected = tf_idf_values[i].sort_values(ascending=False).reset_index()['index'].tolist()[:max_palabras]
    TF_IDF_Mas_Alto_Por_Genero[i]= Words_selected
    Listado_palabras_unicas_por_Tf_Idf.extend(Words_selected)
Listado_palabras_unicas_por_Tf_Idf = list(set(Listado_palabras_unicas_por_Tf_Idf))
Nuevo_vocabulario_por_Tf_Idf = {palabra: indice for indice, palabra in enumerate(Listado_palabras_unicas_por_Tf_Idf)}


print(len(TF_IDF_Mas_Alto_Por_Genero),len(Listado_palabras_unicas_por_Tf_Idf),len(Nuevo_vocabulario_por_Tf_Idf))
#Nuevo_vocabulario_por_Tf_Idf

24 4135 4135


Vectorización

In [36]:
Opciones_para_Vocabularios = {1:vocabulario_lematizado,2:Nuevo_vocabulario_por_Tf_Idf}
vectorizer = CountVectorizer(vocabulary=Opciones_para_Vocabularios[2]) #Opcion 1
xplot_vectorizer = vectorizer.transform(dataTraining_min['plot_lematized'])
features = vectorizer.get_feature_names_out()
prueba = dataTraining_min['plot_lematized'].tolist()[2].split()
print(xplot_vectorizer.shape)
xplot_vectorizer

(7895, 4135)


<7895x4135 sparse matrix of type '<class 'numpy.int64'>'
	with 190085 stored elements in Compressed Sparse Row format>

In [37]:
# Separación de variables predictoras (X) y variable de interés (y) en set de entrenamiento y test usandola función train_test_split
X_train, X_test, y_train_genres, y_test_genres = train_test_split(xplot_vectorizer, y_genres, test_size=0.33, random_state=42)

In [38]:
# Definición y entrenamiento
clf = OneVsRestClassifier(XGBClassifier(n_jobs=-1))
clf.fit(X_train, y_train_genres)

In [39]:
# Predicción del modelo de clasificación
y_pred_genres = clf.predict_proba(X_test)


# Cálculo del AUC de cada clase
auc_scores = roc_auc_score(y_test_genres, y_pred_genres, average=None)

AUC_TOTAL=roc_auc_score(y_test_genres, y_pred_genres, average='macro')
print('AUC_TOTAL: ' +str(AUC_TOTAL))
# Impresión del AUC de cada clase

Para_df={}
for class_idx, auc_score in enumerate(auc_scores):
    Para_df[cols[class_idx]]=auc_score
DF_AUC_Generos = pd.DataFrame(Para_df.values(),columns=['AUC'],index=Para_df.keys())
DF_AUC_Generos=DF_AUC_Generos.sort_values(by='AUC')

Estimador_Vs_AUCS = [clf,AUC_TOTAL,DF_AUC_Generos]

joblib.dump(Estimador_Vs_AUCS,'Lista_2_params_por_defecto_25-05-23 AUC' + str(AUC_TOTAL)+'.pkl', compress=3)

DF_AUC_Generos

AUC_TOTAL: 0.8243424641409606


Unnamed: 0,AUC
p_News,0.687668
p_Drama,0.715138
p_Biography,0.751527
p_Comedy,0.773998
p_Musical,0.775834
p_History,0.776804
p_Romance,0.780851
p_Thriller,0.786216
p_Film-Noir,0.791339
p_Mystery,0.797888


prueba 1 redes neuronales

In [12]:
model = Sequential()
model.add(Dense(669, input_shape=(X_train.shape[1],),activation=relu))
model.add(Dropout(0.10769379659079174))
model.add(Dense(96, input_shape=(X_train.shape[1],),activation=swish))
model.add(Dropout(0.022278422538476006))
model.add(Dense(24, activation=sigmoid))

NameError: name 'Sequential' is not defined

prueba 2 redes neuronales LSTM

In [11]:
# Definición red neuronal con la función Sequential()
model = Sequential()

# Definición de la capa embedding
model.add(Embedding(len(Opciones_para_Vocabularios[2]) + 1, 128, input_length=X_train.shape[1]))
# Definición de la capa recurrente LSTM
model.add(LSTM(32))
# Definición de dropout para evitar overfitting
model.add(Dropout(0.5))
# Definición capa densa con función sigmoide para predicción binaria final
model.add(Dense(24, activation='sigmoid'))

# Definición de función de perdida.
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

# Impresión de la arquitectura de la red neuronal
model.summary()

NameError: name 'Sequential' is not defined

transformación de conjunto de test

In [None]:
dataTesting_min  = minusculizar(dataTesting)
dataTesting_min['plot_lematized'] = dataTesting_min['plot'].apply(lambda x: lematizar_texto(x,diccionario_original_a_lemas))
dataTesting_min.head()
xplot_vectorizer_test = vectorizer.transform(dataTesting_min['plot_lematized'])
xplot_vectorizer_test.shape

(3383, 3584)

In [None]:
# transformación variables predictoras X del conjunto de test
cols = ['p_Action', 'p_Adventure', 'p_Animation', 'p_Biography', 'p_Comedy', 'p_Crime', 'p_Documentary', 'p_Drama', 'p_Family',
        'p_Fantasy', 'p_Film-Noir', 'p_History', 'p_Horror', 'p_Music', 'p_Musical', 'p_Mystery', 'p_News', 'p_Romance',
        'p_Sci-Fi', 'p_Short', 'p_Sport', 'p_Thriller', 'p_War', 'p_Western']

# Predicción del conjunto de test
y_pred_test_genres = clf.predict_proba(xplot_vectorizer_test)

In [None]:
# Guardar predicciones en formato exigido en la competencia de kaggle
res = pd.DataFrame(y_pred_test_genres, index=dataTesting.index, columns=cols)
res.to_csv('pred_genres_text_XGBoost_Tfidf.csv', index_label='ID')
res.head()

Unnamed: 0,p_Action,p_Adventure,p_Animation,p_Biography,p_Comedy,p_Crime,p_Documentary,p_Drama,p_Family,p_Fantasy,...,p_Musical,p_Mystery,p_News,p_Romance,p_Sci-Fi,p_Short,p_Sport,p_Thriller,p_War,p_Western
1,0.050803,0.062708,0.0127,0.059062,0.390652,0.032782,0.018333,0.417732,0.046673,0.042374,...,0.003583,0.058935,0.000163,0.693775,0.010411,0.001878,0.00495,0.131452,0.017716,0.000277
4,0.033306,0.005993,0.008762,0.480497,0.250081,0.263188,0.056525,0.653771,0.008383,0.006424,...,0.002775,0.025602,0.000163,0.050535,0.027869,0.00671,0.001096,0.252321,0.00576,0.002395
5,0.031645,0.006459,5e-06,0.014184,0.051633,0.779883,0.000192,0.897102,0.0004,0.00422,...,0.000223,0.364214,0.000136,0.148881,0.037517,1.7e-05,0.001915,0.379841,0.000239,0.003715
6,0.029862,0.029018,0.000143,0.016154,0.177892,0.042458,0.012441,0.846976,0.001682,0.027212,...,0.004678,0.079355,0.000136,0.120675,0.011558,0.000114,0.003584,0.40548,0.026445,0.000592
7,0.247742,0.074026,0.001366,0.001818,0.177133,0.183721,0.003259,0.098827,0.057766,0.490959,...,0.001049,0.01153,0.000136,0.007641,0.854693,0.000179,0.000582,0.160666,0.003061,0.003086
