# Imputation Research Project <img src="https://miro.medium.com/max/1400/1*JPZcoAD9kERfEQxwlaPT-A.jpeg" alt="Alt text image not displaying" width="500" align="right" />
## Notebook 1.1: Exploratory Data Analysis

**Author:** Chike Odenigbo

**Date:** November 25th, 2022

**Notebook Structure:**

* 1.0 Preprocessing

* **1.1 Exploratory Data Analysis**

* 1.2 Masking

* 2.* Models


Water Sugar Alcohol

In [1]:
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
from rpy2.robjects import globalenv

import pandas as pd
from scipy.stats import variation
from src.preprocessing.preprocessing import NumericalVariableCleaner, NumericalMasker # utility preprocessing packages found in src folder
import os
from src.visualization.visualize import histogram, box_plot, bar_plot
from itertools import chain, combinations
from pathlib import Path
from notebook_config import ROOT_DIR # setup.py file changed the root of the project so it is set in the config file
ROOT_DIR = ROOT_DIR.as_posix() # convert root path to windows readable path (i.e. change backslash to forward slash)
import json
import numpy as np
import functools as ft

Unable to determine R home: [WinError 2] The system cannot find the file specified


In [2]:
notebook_nm = '2.0-masking'
fig_dir = f'{ROOT_DIR}/reports/figures/'
output_prefix = notebook_nm

In [4]:
nutrition_df.rank(ascending = False, method = 'first', na_option = 'bottom', pct = True)

Unnamed: 0,name,serving_size,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water
0,0.612243,0.000114,0.899078,0.819206,0.536011,0.837979,0.510069,0.784390,0.134714,0.894641,0.716009,0.891910,0.887473,0.665263,0.542155,0.089885,0.276596,0.092616,0.248834,0.529639,0.854705,0.441347,0.264080,0.617818,0.617818,0.470816,0.907270,0.650245,0.753897,0.842872,0.411310,0.839800,0.923199,0.535670,0.846285,0.927409,0.557060,0.564569,0.548185,0.534646,0.558084,0.562180,0.560132,0.143020,0.564911,0.549437,0.574127,0.557629,0.562863,0.546592,0.561156,0.566731,0.564569,0.555353,0.563887,0.005689,0.436000,0.548868,0.134828,0.021390,0.152122,0.045511,0.079417,0.137217,0.947207,0.894413,0.821709,0.857663,0.536011,0.008533,0.936625,0.034361,0.027876,0.825691
1,0.335874,0.000228,0.017294,0.161679,0.536125,0.959722,0.216066,0.316646,0.134828,0.593128,0.135510,0.545682,0.063261,0.398794,0.417681,0.089999,0.143816,0.042895,0.182842,0.529753,0.371487,0.304130,0.264194,0.098987,0.098987,0.181818,0.233474,0.017294,0.253840,0.046080,0.008989,0.121174,0.121402,0.503129,0.127204,0.461941,0.360337,0.253157,0.323017,0.337012,0.363864,0.333144,0.341677,0.143133,0.376948,0.385937,0.400614,0.350552,0.365002,0.420071,0.350324,0.365912,0.375811,0.399022,0.374559,0.422574,0.044487,0.301399,0.127773,0.021504,0.146092,0.045625,0.079531,0.032427,0.017522,0.163500,0.007509,0.013995,0.536125,0.008647,0.368529,0.034475,0.027990,0.887245
2,0.579929,0.000341,0.858801,0.819320,0.536238,0.920241,0.434407,0.316760,0.134941,0.675503,0.494368,0.778359,0.703948,0.489703,0.501650,0.090113,0.173512,0.092730,0.147002,0.529867,0.577768,0.256229,0.264308,0.330413,0.330413,0.181932,0.721015,0.469678,0.852771,0.643190,0.241779,0.792923,0.474457,0.714416,0.784503,0.851291,0.524633,0.531005,0.491182,0.534759,0.526112,0.523381,0.528502,0.143247,0.530322,0.532484,0.535897,0.539993,0.529412,0.527933,0.529753,0.533735,0.537831,0.528957,0.530663,0.578678,0.192741,0.317784,0.043008,0.021618,0.045739,0.045739,0.079645,0.100580,0.884742,0.839914,0.821823,0.796223,0.536238,0.008761,0.783821,0.034589,0.028103,0.049494
3,0.068722,0.000455,0.645807,0.663784,0.536352,0.822392,0.366139,0.784503,0.135055,0.404142,0.110593,0.230857,0.147571,0.558994,0.542269,0.090226,0.211970,0.092843,0.109000,0.529981,0.151894,0.441461,0.264421,0.541359,0.541359,0.244169,0.085220,0.032199,0.059051,0.024576,0.004096,0.041984,0.109910,0.490272,0.177153,0.364888,0.304358,0.355786,0.333713,0.210263,0.195130,0.329048,0.328251,0.143361,0.325748,0.309478,0.373763,0.265445,0.272613,0.343498,0.302082,0.311753,0.310502,0.305154,0.306292,0.086130,0.061668,0.395836,0.083969,0.021732,0.078735,0.045853,0.077711,0.073160,0.649448,0.663784,0.628172,0.363409,0.536352,0.008875,0.200933,0.034702,0.028217,0.819888
4,0.143020,0.000569,0.667198,0.527136,0.515189,0.714757,0.425873,0.645125,0.135169,0.866538,0.537718,0.624189,0.759131,0.416543,0.299807,0.090340,0.255319,0.052679,0.211742,0.472067,0.788599,0.252930,0.264535,0.605302,0.605302,0.470930,0.278530,0.747070,0.889749,0.768802,0.621231,0.731938,0.783707,0.595290,0.620207,0.840710,0.568096,0.572989,0.550802,0.555240,0.568893,0.567983,0.573672,0.143475,0.574468,0.574468,0.575947,0.575492,0.574127,0.567186,0.567869,0.574013,0.566617,0.570486,0.574468,0.267493,0.374787,0.099101,0.134941,0.021845,0.152236,0.045967,0.079759,0.137331,0.668904,0.541017,0.636136,0.793037,0.515189,0.008989,0.849585,0.034816,0.028331,0.444078
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8784,0.874275,0.999545,0.589714,0.513369,0.272386,0.678348,0.154966,0.682216,0.999545,0.150529,0.432131,0.301399,0.577426,0.551940,0.445102,0.999545,0.999545,0.999545,0.999545,0.212197,0.084196,0.999545,0.263739,0.399590,0.399590,0.327910,0.641711,0.667198,0.294118,0.708954,0.715440,0.239049,0.294004,0.233474,0.175447,0.157242,0.124588,0.129252,0.126408,0.206053,0.136193,0.170213,0.093754,0.059506,0.139265,0.118785,0.105700,0.140972,0.152350,0.166117,0.141654,0.106269,0.129708,0.113892,0.145068,0.999545,0.999545,0.999545,0.999545,0.999545,0.999545,0.999545,0.999545,0.999545,0.586870,0.513028,0.503698,0.676983,0.272386,0.999545,0.536125,0.999545,0.999545,0.340539
8785,0.425873,0.999659,0.372852,0.271362,0.050859,0.702241,0.999886,1.000000,0.999659,0.096143,0.278644,0.074980,0.351007,0.999886,0.999886,0.999659,0.999659,0.999659,0.999659,0.096143,0.465013,0.999659,0.999886,0.440209,0.440209,0.999886,0.641825,0.322221,0.289908,0.452384,0.470019,0.167937,0.567072,0.573558,0.140289,0.042098,0.035044,0.086699,0.066105,0.041871,0.078052,0.073273,0.069064,0.999886,0.031403,0.058255,0.037774,0.059392,0.044374,0.090226,0.079873,0.046422,0.032996,0.062123,0.024576,0.999659,0.999659,0.999659,0.999659,0.999659,0.999659,0.999659,0.999659,0.999659,0.372739,0.271021,0.328479,0.529867,0.050859,0.999659,0.340653,0.999659,0.999659,0.553647
8786,0.416430,0.999772,0.125725,0.060872,0.179315,0.743202,1.000000,0.784276,0.999772,0.142792,0.318239,0.168392,0.351121,1.000000,1.000000,0.999772,0.999772,0.999772,0.999772,0.140175,0.522016,0.999772,1.000000,0.420526,0.420526,1.000000,0.641939,0.463193,0.465013,0.643077,0.538742,0.412334,0.689271,0.607578,0.275572,0.313005,0.258050,0.290477,0.270679,0.277734,0.316532,0.283650,0.256571,1.000000,0.255547,0.279895,0.253840,0.264877,0.280692,0.327569,0.302537,0.268290,0.238707,0.275117,0.253726,0.999772,0.999772,0.999772,0.999772,0.999772,0.999772,0.999772,0.999772,0.999772,0.123564,0.059392,0.105473,0.386051,0.179315,0.999772,0.682672,0.999772,0.999772,0.556377
8787,0.871772,0.999886,0.620093,0.557401,0.283422,0.683809,0.156332,0.682330,0.999886,0.131414,0.432928,0.425873,0.577540,0.631812,0.542041,0.999886,0.999886,0.999886,0.999886,0.188190,0.079417,0.999886,0.263853,0.390943,0.390943,0.328024,0.642053,0.692229,0.473546,0.709068,0.733758,0.226988,0.275344,0.225850,0.191603,0.158721,0.103425,0.099556,0.094550,0.174081,0.091933,0.187962,0.064740,0.094664,0.106952,0.079873,0.062123,0.114347,0.122653,0.156332,0.110479,0.067926,0.094891,0.077483,0.119240,0.999886,0.999886,0.999886,0.999886,0.999886,0.999886,0.999886,0.999886,0.999886,0.614746,0.553078,0.534873,0.684833,0.283422,0.999886,0.551485,0.999886,0.999886,0.316532


In [4]:
pd.set_option('display.max_columns', None)
nutrition_df = pd.read_csv(f'{ROOT_DIR}/data/interim/nutrition_numerical.csv')
masker = NumericalMasker()
nutrition_df = masker.mask(nutrition_df, 'water', k = 4, n = None, no_cols_frac = 0.1, no_cols = None, prob_range_non_mask = (0,0.5), prob_range_mask = (0.5,1), frac_na=0.1, rank_method = 'first', normalize_weights = True, selected_cols = None, seed = None,  max_corr = 0.9, min_corr = 0.3, col_weights = None)
nutrition_df[nutrition_df['mask_ind_water']==1]

INFO:root:Starting masking process for the water field.
INFO:root:Selecting key columns.
INFO:root:Masking 878 rows which make up for 9.99% of the observations into 4 clusters.
INFO:root:Finished creating correlation matrix.
INFO:root:Normalizing weights.
INFO:root:Finished creating ranking matrix.
INFO:root:Finished ordering similar rows.
INFO:root:Finished creating clusters in the dataframe.
INFO:root:Finished creating index weights.


Unnamed: 0,name,serving_size,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water,mask_ind_water
1,"Nuts, pecans",100,72.0,6.2,0,0.0,40.5,22.0,0.0,1.167,0.863,0.130,0.660,56.0,3.0,0.0,29.0,9.0,17.0,0.00,0.210,1.1,0.0,1.40,1.40,3.5,70.0,1.200,2.53,121.0,4.500,277.0,410.0,3.8,4.53,9.17,0.397,1.177,0.929,0.152,1.829,0.453,0.262,0.000,0.336,0.598,0.287,0.183,0.426,0.363,0.474,0.306,0.093,0.215,0.411,13.86,9.6,3.97,0.04,0.0,0.04,0.0,0.00,3.90,71.97,6.180,40.801,21.614,0.0,0.0,1.49,0.0,0.0,3.52,1
2,"Eggplant, raw",100,0.2,,0,2.0,6.9,22.0,0.0,0.649,0.281,0.037,0.039,23.0,1.0,0.0,14.0,0.0,36.0,0.00,0.084,2.2,0.0,0.30,0.30,3.5,9.0,0.081,0.23,14.0,0.232,24.0,229.0,0.3,0.16,0.98,0.051,0.057,0.164,0.006,0.186,0.041,0.023,0.000,0.045,0.064,0.047,0.011,0.043,0.043,0.042,0.037,0.009,0.027,0.053,5.88,3.0,3.53,1.54,0.0,1.58,0.0,0.00,0.26,0.18,0.034,0.016,0.076,0.0,0.0,0.66,0.0,0.0,92.30,1
3,"Teff, uncooked",100,2.4,0.4,0,12.0,13.1,0.0,0.0,3.363,0.942,0.270,0.390,9.0,0.0,0.0,5.0,0.0,66.0,0.00,0.482,0.0,0.0,0.08,0.08,1.9,180.0,0.810,7.63,184.0,9.240,429.0,427.0,4.4,3.63,13.30,0.747,0.517,0.820,0.236,3.349,0.477,0.301,0.000,0.501,1.068,0.376,0.428,0.698,0.664,0.622,0.510,0.139,0.458,0.686,73.13,8.0,1.84,0.47,0.0,0.73,0.0,0.01,0.62,2.38,0.449,0.589,1.071,0.0,0.0,2.37,0.0,0.0,8.82,1
13,"Crackers, rusk toast",100,7.2,1.4,78,253.0,0.0,87.0,23.0,4.625,0.605,0.399,0.404,41.0,12.0,0.0,0.0,0.0,0.0,0.18,0.047,0.0,0.0,0.00,0.00,0.0,27.0,0.245,2.72,36.0,0.439,153.0,245.0,20.1,1.10,13.50,0.606,0.649,0.944,0.272,3.192,0.498,0.309,0.000,0.610,1.010,0.684,0.290,0.668,1.052,0.768,0.512,0.171,0.471,0.684,72.30,0.0,0.00,0.00,0.0,0.00,0.0,0.00,0.00,7.20,1.376,2.755,2.310,78.0,0.0,1.20,0.0,0.0,5.50,1
29,"Nuts, dried, pine nuts",100,68.0,4.9,0,2.0,55.8,34.0,0.0,4.387,0.313,0.227,0.364,29.0,1.0,0.0,17.0,0.0,9.0,0.00,0.094,0.8,0.0,9.33,9.33,53.9,16.0,1.324,5.53,251.0,8.802,575.0,597.0,0.7,6.45,13.69,0.684,2.413,1.303,0.289,2.926,0.691,0.341,0.000,0.542,0.991,0.540,0.259,0.524,0.673,0.835,0.370,0.107,0.509,0.687,13.08,3.7,3.59,0.07,0.0,0.07,0.0,0.00,3.45,68.37,4.899,18.764,34.071,0.0,0.0,2.59,0.0,0.0,2.28,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8756,"Beef, braised, cooked, choice, trimmed to 0"" f...",100,8.3,2.8,97,60.0,100.6,8.0,0.0,4.923,0.830,0.282,0.080,5.0,1.0,0.0,0.0,0.0,0.0,3.38,0.512,0.0,6.0,0.13,0.13,1.6,13.0,0.140,3.76,25.0,0.015,234.0,358.0,38.6,9.39,31.32,1.811,2.106,2.884,0.333,5.101,1.395,1.033,0.158,1.371,2.593,2.818,0.914,1.221,1.289,1.230,1.418,0.359,1.111,1.449,0.00,0.0,0.00,0.00,0.0,0.00,0.0,0.00,0.00,8.30,2.835,3.757,0.470,97.0,0.0,1.42,0.0,0.0,60.05,1
8771,"Infant formula, not reconstituted, liquid conc...",100,6.5,2.8,0,46.0,15.7,20.0,20.0,1.650,0.000,0.119,0.076,381.0,100.0,0.0,0.0,0.0,0.0,0.38,0.076,15.2,82.0,1.86,1.86,10.2,133.0,0.152,2.29,14.0,0.000,80.0,146.0,2.5,1.14,3.20,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,14.10,0.0,11.20,0.00,0.0,0.00,0.0,0.00,0.00,6.50,2.800,2.030,1.317,0.0,0.0,0.80,0.0,0.0,75.40,1
8775,"Beef, raw, all grades, trimmed to 0"" fat, sepa...",100,4.7,1.7,65,66.0,0.0,4.0,0.0,5.774,0.000,0.180,0.063,6.0,2.0,0.0,0.0,0.0,0.0,2.98,0.648,0.0,3.0,0.00,0.00,1.5,5.0,0.099,2.28,26.0,0.000,226.0,402.0,23.9,5.29,22.66,1.479,1.662,2.393,0.249,4.040,1.068,0.919,0.114,1.155,2.131,2.382,0.658,0.987,1.040,1.005,1.164,0.278,0.925,1.221,0.04,0.0,0.00,0.00,0.0,0.00,0.0,0.00,0.00,4.66,1.683,2.164,0.191,65.0,0.0,1.06,0.0,0.0,71.57,1
8778,"Beef, roasted, cooked, choice, trimmed to 1/8""...",100,30.0,12.0,84,64.0,86.1,7.0,0.0,3.430,0.310,0.170,0.070,0.0,0.0,0.0,0.0,0.0,0.0,2.55,0.230,0.0,19.0,0.24,0.24,2.0,11.0,0.084,2.35,20.0,0.013,176.0,302.0,22.4,5.40,22.60,1.363,1.428,2.065,0.253,3.396,1.233,0.774,0.000,1.016,1.786,1.880,0.579,0.882,0.998,0.864,0.987,0.253,0.759,1.099,0.00,0.0,0.00,0.00,0.0,0.00,0.0,0.00,0.00,29.79,12.010,12.800,1.050,84.0,0.0,0.98,0.0,0.0,46.87,1


In [5]:
masker._similar_ordered_index

Unnamed: 0,original_index,score
0,290,1.020886
1,2250,1.032475
2,1282,1.045401
3,6143,1.063398
4,6107,1.081666
...,...,...
8784,6209,3.267691
8785,6145,3.298780
8786,619,3.376419
8787,772,3.392531


In [38]:
# TODO Show histogram of the masked variables distribution 
nutrition_df[nutrition_df.index.isin([6209,6145,619, 3840, 772])] 

Unnamed: 0,name,serving_size,total_fat,saturated_fat,cholesterol,sodium,choline,folate,folic_acid,niacin,pantothenic_acid,riboflavin,thiamin,vitamin_a,vitamin_a_rae,carotene_alpha,carotene_beta,cryptoxanthin_beta,lutein_zeaxanthin,vitamin_b12,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,tocopherol_alpha,vitamin_k,calcium,copper,irom,magnesium,manganese,phosphorous,potassium,selenium,zink,protein,alanine,arginine,aspartic_acid,cystine,glutamic_acid,glycine,histidine,hydroxyproline,isoleucine,leucine,lysine,methionine,phenylalanine,proline,serine,threonine,tryptophan,tyrosine,valine,carbohydrate,fiber,sugars,fructose,galactose,glucose,lactose,maltose,sucrose,fat,saturated_fatty_acids,monounsaturated_fatty_acids,polyunsaturated_fatty_acids,fatty_acids_total_trans,alcohol,ash,caffeine,theobromine,water,mask_ind
619,"Salt, table",100,0.0,,0,38758.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,24.0,0.03,0.33,1.0,0.1,0.0,8.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,99.8,0.0,0.0,0.2,0
772,"Leavening agents, baking soda",100,0.0,,0,27360.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,36.9,0.0,0.0,0.2,0
3840,"Seasoning mix, coriander & annatto, sazon, dry",100,0.0,,0,17000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,99.8,0.0,0.0,0.2,0
6145,"Alcoholic beverage, all (gin, rum, vodka, whis...",100,0.0,,0,1.0,0.0,0.0,0.0,0.013,0.0,0.004,0.006,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001,0.0,0.0,0.0,0.0,0.0,0.0,0.021,0.04,0.0,0.018,4.0,2.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,42.5,0.01,0.0,0.0,57.5,0
6209,"Alcoholic beverage, all (gin, rum, vodka, whis...",100,0.0,,0,1.0,0.0,0.0,0.0,0.013,0.0,0.004,0.006,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001,0.0,0.0,0.0,0.0,0.0,0.0,0.021,0.04,0.0,0.018,4.0,2.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,39.7,0.0,0.0,0.0,60.3,0


In [13]:
# Import R's base package
base = importr("base")

# Import R's utility packages
utils = importr("utils")

# Select mirror 
utils.chooseCRANmirror(ind=1)

# For automatic translation of Pandas objects to R
pandas2ri.activate()

# Enable R magic
%load_ext rpy2.ipython

globalenv["nutrition_df_r"] = nutrition_df

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


In [4]:
utils.install_packages("remotes")
%R remotes::install_github("njtierney/naniar")

R[write to console]: Installing package into 'C:/Users/Chike/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)

R[write to console]: trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.2/remotes_2.4.2.zip'

R[write to console]: Content type 'application/zip'
R[write to console]:  length 399984 bytes (390 KB)

R[write to console]: downloaded 390 KB




package 'remotes' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\Chike\AppData\Local\Temp\Rtmpe8QF83\downloaded_packages


R[write to console]: Downloading GitHub repo njtierney/naniar@HEAD



These packages have more recent versions available.
It is recommended to update all of them.
Which would you like to update?

1: All                          
2: CRAN packages only           
3: None                         
4: vctrs (0.5.0 -> 0.5.1) [CRAN]
5: plyr  (1.8.7 -> 1.8.8) [CRAN]

Enter one or more numbers, or an empty line to skip updates: 3


R[write to console]: Installing 15 packages: prettyunits, bit, progress, bit64, vroom, hms, crayon, clipr, gridExtra, readr, UpSetR, viridis, forcats, visdat, norm

R[write to console]: Installing packages into 'C:/Users/Chike/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)




  There is a binary version available but the source version is later:
    binary source needs_compilation
bit  4.0.4  4.0.5              TRUE

Do you want to install from sources the package which needs compilation? (Yes/no/cancel) yes


R[write to console]: trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.2/prettyunits_1.1.1.zip'

R[write to console]: Content type 'application/zip'
R[write to console]:  length 37727 bytes (36 KB)

R[write to console]: downloaded 36 KB


R[write to console]: trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.2/progress_1.2.2.zip'

R[write to console]: Content type 'application/zip'
R[write to console]:  length 85980 bytes (83 KB)

R[write to console]: downloaded 83 KB


R[write to console]: trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.2/bit64_4.0.5.zip'

R[write to console]: Content type 'application/zip'
R[write to console]:  length 494410 bytes (482 KB)

R[write to console]: downloaded 482 KB


R[write to console]: trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.2/vroom_1.6.0.zip'

R[write to console]: Content type 'application/zip'
R[write to console]:  length 1350850 bytes (1.3 MB)

R[write to console]: downloaded 1.3 MB


R[wr

package 'prettyunits' successfully unpacked and MD5 sums checked
package 'progress' successfully unpacked and MD5 sums checked
package 'bit64' successfully unpacked and MD5 sums checked
package 'vroom' successfully unpacked and MD5 sums checked
package 'hms' successfully unpacked and MD5 sums checked
package 'crayon' successfully unpacked and MD5 sums checked
package 'clipr' successfully unpacked and MD5 sums checked
package 'gridExtra' successfully unpacked and MD5 sums checked
package 'readr' successfully unpacked and MD5 sums checked
package 'UpSetR' successfully unpacked and MD5 sums checked
package 'viridis' successfully unpacked and MD5 sums checked
package 'forcats' successfully unpacked and MD5 sums checked
package 'visdat' successfully unpacked and MD5 sums checked
package 'norm' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\Chike\AppData\Local\Temp\Rtmpe8QF83\downloaded_packages


R[write to console]: installing the source package 'bit'


R[write to console]: trying URL 'https://cloud.r-project.org/src/contrib/bit_4.0.5.tar.gz'

R[write to console]: Content type 'application/x-gzip'
R[write to console]:  length 827745 bytes (808 KB)

R[write to console]: downloaded 808 KB


R[write to console]: 

R[write to console]: 
R[write to console]: The downloaded source packages are in
	'C:\Users\Chike\AppData\Local\Temp\Rtmpe8QF83\downloaded_packages'
R[write to console]: 
R[write to console]: 

R[write to console]: Running `R CMD build`...



* checking for file 'C:\Users\Chike\AppData\Local\Temp\Rtmpe8QF83\remotes519c1c7e16a4\njtierney-naniar-eefd800/DESCRIPTION' ... OK
* preparing 'naniar':
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building 'naniar_0.6.1.9000.tar.gz'


R[write to console]: Installing package into 'C:/Users/Chike/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)



0
'naniar'


In [5]:
%R library(naniar)

0,1,2,3,4,5,6
'naniar','tools','stats',...,'datasets','methods','base'


In [16]:
%R mcar_test(nutrition_df[c("saturated_fat")])

Unnamed: 0,statistic,df,p.value,missing.patterns
1,3.0024360000000003e-27,0.0,0.0,2
