# Machine Learning Analysis for Performance Prediction in 'La Más Draga'

<a id="table_of_contents"></a>

### Table of contents

<ol>
  <li><a href="#overview">Situation Overview</a>
  <ul>
   <li><a href='#dataset'>Examine the overall dataset
    </a></li>
    <li><a href='#cleaning'>Execute cleaning procedures
    </a></li>
    
  </ul>
  </li>
  
  <li><a href="#eda">Exploratory analysis</a>
  <ul>
    <li><a href='#unique_values'>Begin by computing distinct values
    </a></li>
    <li><a href='#apps_by_category'>Find out the number of apps by category
    </a></li>
    </ul>
  </li>  
  
  <li><a href="#analysis">Data analysis</a>
  <ul>
    <li><a href='#free_paid_apps'>Comparing ratings, reviews and installs between free and paid apps
    </a></li>
    </ul>
  </li>  
  <li><a href="#insights">Insights</a>
</ol>

<a id="overview"></a>
## Situation Overview
Utilizing machine learning to predict my favorite contestant performance during "La Más Draga", which is competition similar to Ru Paul's Drag Race.

1. From our data, what are the significant factors contributing to competition results?
2. Who are the competing participants in the same category as my favorite participant??
3. Does my favorite participant have a good chance of winning the competition?

<a href="#table_of_contents">Navigate to contents</a>

<a id="dataset"></a>
Examine the overall dataset
<br>
<a href="#table_of_contents">Navigate to contents</a>

In [2]:
#import python libraries
import pandas as pd
import mysql.connector
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [3]:
#connect to mysql database
conn = mysql.connector.connect(
    host="localhost",
    user="root",
    password="NilArj_21",
    database="project",
)

In [17]:
# participants table
df_table_1 = pd.read_sql("SELECT * FROM la_mas_draga;", conn)

print(df_table_1.head())

  df_table_1 = pd.read_sql("SELECT * FROM la_mas_draga;", conn)


  Lugar         Participante              Nombre Lugar de residencia  Edad  \
0     1  Deborah "La Grande"       Ramses Molina    Ciudad de México    35   
1     2      Bárbara Durango    Ricardo Martínez    Ciudad de México    40   
2     3            Eva Blunt  Pablo Levy Morales    Ciudad de México    36   
3     4        Margaret Y Ya    Margaret Peltier    Ciudad de México    29   
4     5         Lana Boswell  Alan de Jesús Cruz    Ciudad de México    30   

   Retos ganados     Resultado  Temporada Selección  
0              0  Ganadora[7]​          1      None  
1              1    Finalistas          1      None  
2              1    Finalistas          1      None  
3              2    Finalistas          1      None  
4              0  3ª eliminada          1      None  


In [18]:
df_table_1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 70 entries, 0 to 69
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Lugar                70 non-null     object
 1   Participante         70 non-null     object
 2   Nombre               70 non-null     object
 3   Lugar de residencia  70 non-null     object
 4   Edad                 70 non-null     int64 
 5   Retos ganados        70 non-null     int64 
 6   Resultado            59 non-null     object
 7   Temporada            70 non-null     int64 
 8   Selección            63 non-null     object
dtypes: int64(3), object(6)
memory usage: 5.0+ KB


In [20]:
# progress table
df_table_2 = pd.read_sql("SELECT * FROM la_mas_draga_progress;", conn)

print(df_table_2.head())

       Concursante  Episodio Nombre_de_episodio Progreso
0  Bárbara Durango         1               Diva     BAJA
1  Bárbara Durango         1               Diva     BAJA
2  Bárbara Durango         1               Diva     BAJA
3  Bárbara Durango         1               Diva     BAJA
4  Bárbara Durango         1               Diva     BAJA


  df_table_2 = pd.read_sql("SELECT * FROM la_mas_draga_progress;", conn)


In [21]:
df_table_2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 294 entries, 0 to 293
Data columns (total 4 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Concursante         294 non-null    object
 1   Episodio            294 non-null    int64 
 2   Nombre_de_episodio  294 non-null    object
 3   Progreso            264 non-null    object
dtypes: int64(1), object(3)
memory usage: 9.3+ KB


<a id="cleaning"></a>
Execute cleaning procedures
<br>
<a href="#table_of_contents">Navigate to contents</a>

In [22]:
#create the data set 
df_contestants = df_table_1.loc[df_table_1["Temporada"]<6]
print(df_contestants.tail())

   Lugar         Participante          Nombre    Lugar de residencia  Edad  \
51     8      Aisha Dollkills  Andrey Miranda   San José, Costa Rica    21   
52     9           Light King     Juan Correa         Cali, Colombia    26   
53    10            Huma Kyle     Tony Porras       Chihuahua, Chih.    29   
54    11  Isabella y Catalina   Alex y Carlos  San Diego, California    25   
55    12           Deseos Fab  Alexis Vázquez       Ciudad de México    24   

    Retos ganados     Resultado  Temporada      Selección  
51              0  4ª Eliminada          5     Audiciones  
52              0      Abandona          5    Secretísima  
53              0  3ª Eliminada          5     Audiciones  
54              0  2ª Eliminada          5   Secretísimas  
55              0  1ª Eliminada          5  La más Votada  


In [24]:
#change data type
df_contestants["Lugar"] = df_contestants["Lugar"].astype(int)
df_contestants["Lugar de residencia"] = df_contestants["Lugar de residencia"].astype(str)
df_contestants["Resultado"] = df_contestants["Resultado"].astype(str)
df_contestants.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 56 entries, 0 to 55
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Lugar                56 non-null     int32 
 1   Participante         56 non-null     object
 2   Nombre               56 non-null     object
 3   Lugar de residencia  56 non-null     object
 4   Edad                 56 non-null     int64 
 5   Retos ganados        56 non-null     int64 
 6   Resultado            56 non-null     object
 7   Temporada            56 non-null     int64 
 8   Selección            49 non-null     object
dtypes: int32(1), int64(3), object(5)
memory usage: 4.2+ KB


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_contestants["Lugar"] = df_contestants["Lugar"].astype(int)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_contestants["Lugar de residencia"] = df_contestants["Lugar de residencia"].astype(str)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_contestants["Resultado"] = df_contestants["Resulta

In [28]:
#fill null values
df_contestants["Selección"] = df_contestants["Selección"].fillna("Audiciones")
df_contestants.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 56 entries, 0 to 55
Data columns (total 10 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Lugar                56 non-null     int32 
 1   Participante         56 non-null     object
 2   Nombre               56 non-null     object
 3   Lugar de residencia  56 non-null     object
 4   Edad                 56 non-null     int64 
 5   Retos ganados        56 non-null     int64 
 6   Resultado            56 non-null     object
 7   Temporada            56 non-null     int64 
 8   Selección            56 non-null     object
 9   Seleccion            56 non-null     object
dtypes: int32(1), int64(3), object(6)
memory usage: 4.6+ KB


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_contestants["Selección"] = df_contestants["Selección"].fillna("Audiciones")
