<center><h1>Análise dos dados da obesidade do mundo<h1></center>
<center><img src='https://www.crf-ro.org.br/wp-content/uploads/2018/10/OBESIDADE.jpg'></center>

###O que é obesidade?
A obesidade é uma condição médica causada pelo acúmulo de gordura localizada em diferentes partes do corpo humano.

Existem diferentes causas para a obesidade, sendo que a principal delas é o consumo exagerado de calorias proveniente de alimentos. De forma geral, o acúmulo de gordura acontece quando há um desequilíbrio entre a energia que é inserida no corpo por meio das refeições e a energia que é gasta pelo corpo nas atividades do dia a dia.

Para determinar se uma pessoa está obesa ou não, é feito o cálculo do Índice de Massa Corpórea, que divide o peso do paciente por sua altura elevada ao quadrado. O valor obtido é inserido em uma tabela que possui valores para abaixo do peso normal, dentro do peso normal, acima do peso, obesidade grau I, obesidade grau II e obesidade grau III.

De acordo com dados do Ministério da Saúde, a obesidade no Brasil já atinge a cerca de 18,9 % da população.

A obesidade infantil também é determinada pelo IMC (com valores de referência que são diferentes dos adultos), sendo que essa condição precisa ser observada com atenção médica, já que pode trazer consequências para a vida adulta. [fonte]
https://www.rededorsaoluiz.com.br/doencas/obesidade

In [6]:
#1 importe as biblioteca
import pandas as pd
import numpy as np
import plotly.express as px

In [7]:
#2 carregar a base
df= pd.read_csv('/content/obesity_cleaned.csv')

In [8]:
#3 verificar a dimensão da base de dados
print(f"A base possui {df.shape[0]} linhas e {df.shape[1]} colunas.")

A base possui 24570 linhas e 5 colunas.


In [9]:
#4 verificar os 5 primeiros registros
df.head()

Unnamed: 0.1,Unnamed: 0,Country,Year,Obesity (%),Sex
0,0,Afghanistan,1975,0.5 [0.2-1.1],Both sexes
1,1,Afghanistan,1975,0.2 [0.0-0.6],Male
2,2,Afghanistan,1975,0.8 [0.2-2.0],Female
3,3,Afghanistan,1976,0.5 [0.2-1.1],Both sexes
4,4,Afghanistan,1976,0.2 [0.0-0.7],Male


In [10]:
#5 deletar colunas desnecessarias 
try:
  df.drop('Unnamed: 0',axis=1, inplace=True)
except:
  print('coluna não existe')

In [11]:
#6 verificar dados estatisticos
df.describe()

Unnamed: 0,Year
count,24570.0
mean,1995.5
std,12.121165
min,1975.0
25%,1985.0
50%,1995.5
75%,2006.0
max,2016.0


In [12]:
#7 renomear a coluna obesity
df.rename(columns={'Obesity (%)':'Obesity'},inplace=True)

In [13]:
#7.1 verificar os valores da coluna 
df['Obesity'][0].split()[0]

'0.5'

In [14]:
#8 funçao para corrigir os dados da coluna obesity
def correcao_obesity(obesity):
  return obesity.split()[0]


#testando a funçao
correcao_obesity(df['Obesity'][0])

'0.5'

In [15]:
#9 aplicar a sunçao na coluna obesity
df['Obesity'] = df['Obesity'].apply(correcao_obesity)

In [16]:
#9 verificar se a funçao deu certo
df['Obesity']

0         0.5
1         0.2
2         0.8
3         0.5
4         0.2
         ... 
24565     4.5
24566    24.8
24567    15.5
24568     4.7
24569    25.3
Name: Obesity, Length: 24570, dtype: object

In [17]:
#10 utilizar a funçao lambda para utilizar os dados em uma coluna
df['Obesity'] = df['Obesity'].apply(lambda obesity: obesity.split()[0])

In [18]:
df['Obesity']

0         0.5
1         0.2
2         0.8
3         0.5
4         0.2
         ... 
24565     4.5
24566    24.8
24567    15.5
24568     4.7
24569    25.3
Name: Obesity, Length: 24570, dtype: object

In [19]:
#11 verificar se existe inconsistencia na coluna
df['Obesity'].value_counts()

No      504
0.4     222
0.6     218
0.5     217
0.7     210
       ... 
60.1      1
56.3      1
59.8      1
55.5      1
46.5      1
Name: Obesity, Length: 602, dtype: int64

In [20]:
#12 corrigir os dados No funçao lambda
df['Obesity'] = df['Obesity'].apply(lambda obesity: np.nan if obesity == 'No' else obesity)

In [21]:
# verificar os dados NaN
print(df.isna().sum())
#13 deletar os dados NaN
df.dropna(inplace=True)
# apresentar os valores novamente
print(df.isna().sum())


Country      0
Year         0
Obesity    504
Sex          0
dtype: int64
Country    0
Year       0
Obesity    0
Sex        0
dtype: int64


In [27]:
#converter o numeros para float
df['Obesity'] = df['Obesity'].astype(float)

In [28]:
#14 verificar os tipos de dados das colunas 
df.dtypes

Country     object
Year         int64
Obesity    float64
Sex         object
dtype: object

In [29]:
df.describe()

Unnamed: 0,Year,Obesity
count,24066.0,24066.0
mean,1995.5,12.448932
std,12.12117,10.407428
min,1975.0,0.1
25%,1985.0,3.9
50%,1995.5,10.6
75%,2006.0,18.175
max,2016.0,63.3


In [31]:
df.head()

Unnamed: 0,Country,Year,Obesity,Sex
0,Afghanistan,1975,0.5,Both sexes
1,Afghanistan,1975,0.2,Male
2,Afghanistan,1975,0.8,Female
3,Afghanistan,1976,0.5,Both sexes
4,Afghanistan,1976,0.2,Male


In [34]:
df_2015 = df[ df['Year'] == 2015]
df_2015.groupby('Sex').mean()

Unnamed: 0_level_0,Year,Obesity
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1
Both sexes,2015.0,19.508377
Female,2015.0,22.899476
Male,2015.0,15.980628


In [36]:
df_ini = df[ (df['Year'] == 1975) & (df['Sex'] == 'Both sexes')]
df_fin = df[ (df['Year'] == 2016) & (df['Sex'] == 'Both sexes')]

In [37]:
df_fin

Unnamed: 0,Country,Year,Obesity,Sex
0,Afghanistan,1975,0.5,Both sexes
126,Albania,1975,6.5,Both sexes
252,Algeria,1975,6.9,Both sexes
378,Andorra,1975,12.9,Both sexes
504,Angola,1975,0.8,Both sexes
...,...,...,...,...
23940,Venezuela (Bolivarian Republic of),1975,9.6,Both sexes
24066,Viet Nam,1975,0.1,Both sexes
24192,Yemen,1975,2.8,Both sexes
24318,Zambia,1975,1.5,Both sexes


In [38]:
df_ini.head()

Unnamed: 0,Country,Year,Obesity,Sex
0,Afghanistan,1975,0.5,Both sexes
126,Albania,1975,6.5,Both sexes
252,Algeria,1975,6.9,Both sexes
378,Andorra,1975,12.9,Both sexes
504,Angola,1975,0.8,Both sexes


In [39]:
df_fin.head()

Unnamed: 0,Country,Year,Obesity,Sex
123,Afghanistan,2016,5.5,Both sexes
249,Albania,2016,21.7,Both sexes
375,Algeria,2016,27.4,Both sexes
501,Andorra,2016,25.6,Both sexes
627,Angola,2016,8.2,Both sexes


In [40]:
df_ini.set_index('Country', inplace=True)
df_ini

Unnamed: 0_level_0,Year,Obesity,Sex
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Afghanistan,1975,0.5,Both sexes
Albania,1975,6.5,Both sexes
Algeria,1975,6.9,Both sexes
Andorra,1975,12.9,Both sexes
Angola,1975,0.8,Both sexes
...,...,...,...
Venezuela (Bolivarian Republic of),1975,9.6,Both sexes
Viet Nam,1975,0.1,Both sexes
Yemen,1975,2.8,Both sexes
Zambia,1975,1.5,Both sexes


In [41]:
df_fin.set_index('Country', inplace=True)
df_fin

Unnamed: 0_level_0,Year,Obesity,Sex
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Afghanistan,2016,5.5,Both sexes
Albania,2016,21.7,Both sexes
Algeria,2016,27.4,Both sexes
Andorra,2016,25.6,Both sexes
Angola,2016,8.2,Both sexes
...,...,...,...
Venezuela (Bolivarian Republic of),2016,25.6,Both sexes
Viet Nam,2016,2.1,Both sexes
Yemen,2016,17.1,Both sexes
Zambia,2016,8.1,Both sexes


In [44]:
df_final = df_fin['Obesity'] - df_ini['Obesity']
df_final.sort_values()

Country
Viet Nam         2.0
Singapore        3.1
Japan            3.3
Bangladesh       3.4
Timor-Leste      3.6
                ... 
Cook Islands    27.9
Tonga           28.3
Kiribati        30.1
Niue            31.1
Tuvalu          33.7
Name: Obesity, Length: 191, dtype: float64

In [46]:
df_2016 = df[ (df['Year'] == 2016) & (df['Sex'] == 'Both sexes')]
df_2016.sort_values(by='Obesity')

Unnamed: 0,Country,Year,Obesity,Sex
24189,Viet Nam,2016,2.1,Both sexes
1761,Bangladesh,2016,3.6,Both sexes
21921,Timor-Leste,2016,3.8,Both sexes
9951,India,2016,3.9,Both sexes
3777,Cambodia,2016,3.9,Both sexes
...,...,...,...,...
22803,Tuvalu,2016,51.6,Both sexes
13605,Marshall Islands,2016,52.9,Both sexes
16503,Palau,2016,55.3,Both sexes
5037,Cook Islands,2016,55.9,Both sexes


In [47]:
df_brasil = df[(df['Country'] == 'Brazil') & (df['Sex'] != 'Both sexes')]
df_brasil

Unnamed: 0,Country,Year,Obesity,Sex
2899,Brazil,1975,3.0,Male
2900,Brazil,1975,7.3,Female
2902,Brazil,1976,3.2,Male
2903,Brazil,1976,7.6,Female
2905,Brazil,1977,3.4,Male
...,...,...,...,...
3017,Brazil,2014,24.4,Female
3019,Brazil,2015,18.0,Male
3020,Brazil,2015,24.9,Female
3022,Brazil,2016,18.5,Male


In [50]:

df_brasil_dif = df_brasil[df_brasil['Sex'] == 'Female']['Obesity'] - df_brasil[df_brasil['Sex'] == 'Male']['Obesity']
df_brasil_dif

Year
1975    4.3
1976    4.4
1977    4.6
1978    4.7
1979    4.9
1980    4.9
1981    5.1
1982    5.2
1983    5.4
1984    5.5
1985    5.6
1986    5.7
1987    5.8
1988    5.9
1989    6.0
1990    6.1
1991    6.1
1992    6.3
1993    6.3
1994    6.4
1995    6.4
1996    6.5
1997    6.6
1998    6.7
1999    6.8
2000    6.8
2001    6.8
2002    6.8
2003    6.9
2004    6.9
2005    6.9
2006    6.9
2007    7.0
2008    6.9
2009    7.0
2010    7.0
2011    6.9
2012    6.9
2013    6.9
2014    6.9
2015    6.9
2016    6.9
Name: Obesity, dtype: float64