# # Getting Started with Pandas
### <p style="color:Tomato">Learn the basics of the pandas library.<p/>
#### <p style="color:Gray">Pandas<p/>
Pandas is a library that unifies the most common workflows that data analysts and data scientists previously relied on many different libraries for. 
#### <p style="color:Gray">DataFrame<p/>
* Tabular data is any data that can be represented as rows and columns. The CSV files are all examples of tabular data.
* To represent babular data, Pandas uses a custom data structure called a dataframe. 
* A dataframe is a highly efficient, 2-dimensional data structure that provides a suite of methods and attributes to quickly explore, analyze, and visualize data.<br/>
* Store mixed data types in rows and columns
* Handle missing values gracefully using a custom object, NaN, to represent those values.
* Contain axis labels for both rows and columns and enable you to refer to elements in the dataframe more intuitively. 
<br/>
<br/>
* Tabular data 는 행과 열로 표현할 수 있는 모든 데이터입니다. CSV파일 역시 모두 tabular data 형식의 데이터입니다. <br/>
* Tabular data를 표현하기 위해 Pandas는 DataFrame이라는 사용자 지정 데이터 구조를 사용합니다. 
* DataFrame은 데이터를 신속하게 탐색 분석 및 시각화 할 수 있는 일련의 메서드 및 특성을 제공하는 매우 효율적인 2차원 데이터 구조입니다. 
* 혼합 된 데이터 유형을 행과 열에 저장할 수 있다. 
* 맞춤형 개체 인 NaN을 사용하여 누락 된 값을 정상적으로 처리하여 해당 값을 나타낼 수 있습니다. 
* 행과 열에 대한 축 레이블이 포함되어 있어 직관적으로 데이터 프레임의 요소를 참조할 수 있습니다. <br/>
#### <p style="color:Gray">food_info.csv<p/>
> * A dataset from the United States Department of Agriculture(USDA).
> * This dataset contains nutritional information on the most common foods Americans consume.
> * Each column in the dataset shows a different attribute of the foods and each row describes a different food item.
<br/>

<p style="color:Blue">**1. food_info.csv FILE .**<p/>

In [2]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity="all"

In [1]:
import pandas

#### <p style="color:Gray">To read a CSV file into a dataframe<p/>

In [5]:
# To read in the file `crime_rates.csv` into a dataframe named crime_rates.
crime_rates = pandas.read_csv("crime_rates.csv")
crime_rates[:10]

Unnamed: 0,Albuquerque,749
0,Anaheim,371
1,Anchorage,828
2,Arlington,503
3,Atlanta,1379
4,Aurora,425
5,Austin,408
6,Bakersfield,542
7,Baltimore,1405
8,Boston,835
9,Buffalo,1288


In [6]:
food_info = pandas.read_csv("food_info.csv")
food_info[:10]

Unnamed: 0,NDB_No,Shrt_Desc,Water_(g),Energ_Kcal,Protein_(g),Lipid_Tot_(g),Ash_(g),Carbohydrt_(g),Fiber_TD_(g),Sugar_Tot_(g),...,Vit_A_IU,Vit_A_RAE,Vit_E_(mg),Vit_D_mcg,Vit_D_IU,Vit_K_(mcg),FA_Sat_(g),FA_Mono_(g),FA_Poly_(g),Cholestrl_(mg)
0,1001,BUTTER WITH SALT,15.87,717,0.85,81.11,2.11,0.06,0.0,0.06,...,2499.0,684.0,2.32,1.5,60.0,7.0,51.368,21.021,3.043,215.0
1,1002,BUTTER WHIPPED WITH SALT,15.87,717,0.85,81.11,2.11,0.06,0.0,0.06,...,2499.0,684.0,2.32,1.5,60.0,7.0,50.489,23.426,3.012,219.0
2,1003,BUTTER OIL ANHYDROUS,0.24,876,0.28,99.48,0.0,0.0,0.0,0.0,...,3069.0,840.0,2.8,1.8,73.0,8.6,61.924,28.732,3.694,256.0
3,1004,CHEESE BLUE,42.41,353,21.4,28.74,5.11,2.34,0.0,0.5,...,721.0,198.0,0.25,0.5,21.0,2.4,18.669,7.778,0.8,75.0
4,1005,CHEESE BRICK,41.11,371,23.24,29.68,3.18,2.79,0.0,0.51,...,1080.0,292.0,0.26,0.5,22.0,2.5,18.764,8.598,0.784,94.0
5,1006,CHEESE BRIE,48.42,334,20.75,27.68,2.7,0.45,0.0,0.45,...,592.0,174.0,0.24,0.5,20.0,2.3,17.41,8.013,0.826,100.0
6,1007,CHEESE CAMEMBERT,51.8,300,19.8,24.26,3.68,0.46,0.0,0.46,...,820.0,241.0,0.21,0.4,18.0,2.0,15.259,7.023,0.724,72.0
7,1008,CHEESE CARAWAY,39.28,376,25.18,29.2,3.28,3.06,0.0,,...,1054.0,271.0,,,,,18.584,8.275,0.83,93.0
8,1009,CHEESE CHEDDAR,37.1,406,24.04,33.82,3.71,1.33,0.0,0.28,...,994.0,263.0,0.78,0.6,24.0,2.9,19.368,8.428,1.433,102.0
9,1010,CHEESE CHESHIRE,37.65,387,23.37,30.6,3.6,4.78,0.0,,...,985.0,233.0,,,,,19.475,8.671,0.87,103.0


In [7]:
print(type(food_info))

<class 'pandas.core.frame.DataFrame'>


#### <p style="color:Gray">head() method<p/>
Pandas will return a new dataframe containing just the first 5 rows.

In [10]:
first_rows = food_info.head()
first_rows
print(food_info.head(3))

Unnamed: 0,NDB_No,Shrt_Desc,Water_(g),Energ_Kcal,Protein_(g),Lipid_Tot_(g),Ash_(g),Carbohydrt_(g),Fiber_TD_(g),Sugar_Tot_(g),...,Vit_A_IU,Vit_A_RAE,Vit_E_(mg),Vit_D_mcg,Vit_D_IU,Vit_K_(mcg),FA_Sat_(g),FA_Mono_(g),FA_Poly_(g),Cholestrl_(mg)
0,1001,BUTTER WITH SALT,15.87,717,0.85,81.11,2.11,0.06,0.0,0.06,...,2499.0,684.0,2.32,1.5,60.0,7.0,51.368,21.021,3.043,215.0
1,1002,BUTTER WHIPPED WITH SALT,15.87,717,0.85,81.11,2.11,0.06,0.0,0.06,...,2499.0,684.0,2.32,1.5,60.0,7.0,50.489,23.426,3.012,219.0
2,1003,BUTTER OIL ANHYDROUS,0.24,876,0.28,99.48,0.0,0.0,0.0,0.0,...,3069.0,840.0,2.8,1.8,73.0,8.6,61.924,28.732,3.694,256.0
3,1004,CHEESE BLUE,42.41,353,21.4,28.74,5.11,2.34,0.0,0.5,...,721.0,198.0,0.25,0.5,21.0,2.4,18.669,7.778,0.8,75.0
4,1005,CHEESE BRICK,41.11,371,23.24,29.68,3.18,2.79,0.0,0.51,...,1080.0,292.0,0.26,0.5,22.0,2.5,18.764,8.598,0.784,94.0


   NDB_No                 Shrt_Desc  Water_(g)  Energ_Kcal  Protein_(g)  \
0    1001          BUTTER WITH SALT      15.87         717         0.85   
1    1002  BUTTER WHIPPED WITH SALT      15.87         717         0.85   
2    1003      BUTTER OIL ANHYDROUS       0.24         876         0.28   

   Lipid_Tot_(g)  Ash_(g)  Carbohydrt_(g)  Fiber_TD_(g)  Sugar_Tot_(g)  \
0          81.11     2.11            0.06           0.0           0.06   
1          81.11     2.11            0.06           0.0           0.06   
2          99.48     0.00            0.00           0.0           0.00   

        ...        Vit_A_IU  Vit_A_RAE  Vit_E_(mg)  Vit_D_mcg  Vit_D_IU  \
0       ...          2499.0      684.0        2.32        1.5      60.0   
1       ...          2499.0      684.0        2.32        1.5      60.0   
2       ...          3069.0      840.0        2.80        1.8      73.0   

   Vit_K_(mcg)  FA_Sat_(g)  FA_Mono_(g)  FA_Poly_(g)  Cholestrl_(mg)  
0          7.0      51.368    

#### <p style="color:Gray">columns attribute<p/>
#### <p style="color:Gray">shape attribute<p/>

In [12]:
column_names = food_info.columns
column_names

Index(['NDB_No', 'Shrt_Desc', 'Water_(g)', 'Energ_Kcal', 'Protein_(g)',
       'Lipid_Tot_(g)', 'Ash_(g)', 'Carbohydrt_(g)', 'Fiber_TD_(g)',
       'Sugar_Tot_(g)', 'Calcium_(mg)', 'Iron_(mg)', 'Magnesium_(mg)',
       'Phosphorus_(mg)', 'Potassium_(mg)', 'Sodium_(mg)', 'Zinc_(mg)',
       'Copper_(mg)', 'Manganese_(mg)', 'Selenium_(mcg)', 'Vit_C_(mg)',
       'Thiamin_(mg)', 'Riboflavin_(mg)', 'Niacin_(mg)', 'Vit_B6_(mg)',
       'Vit_B12_(mcg)', 'Vit_A_IU', 'Vit_A_RAE', 'Vit_E_(mg)', 'Vit_D_mcg',
       'Vit_D_IU', 'Vit_K_(mcg)', 'FA_Sat_(g)', 'FA_Mono_(g)', 'FA_Poly_(g)',
       'Cholestrl_(mg)'],
      dtype='object')

In [14]:
dimensions = food_info.shape
num_rows = dimensions[0]
num_cols = dimensions[1]
dimensions
num_rows
num_cols

(8618, 36)

8618

36

In [15]:
food_info.shape[0]
food_info.shape[1]

8618

36

In [17]:
first_twenty = food_info.head(20)
first_twenty

Unnamed: 0,NDB_No,Shrt_Desc,Water_(g),Energ_Kcal,Protein_(g),Lipid_Tot_(g),Ash_(g),Carbohydrt_(g),Fiber_TD_(g),Sugar_Tot_(g),...,Vit_A_IU,Vit_A_RAE,Vit_E_(mg),Vit_D_mcg,Vit_D_IU,Vit_K_(mcg),FA_Sat_(g),FA_Mono_(g),FA_Poly_(g),Cholestrl_(mg)
0,1001,BUTTER WITH SALT,15.87,717,0.85,81.11,2.11,0.06,0.0,0.06,...,2499.0,684.0,2.32,1.5,60.0,7.0,51.368,21.021,3.043,215.0
1,1002,BUTTER WHIPPED WITH SALT,15.87,717,0.85,81.11,2.11,0.06,0.0,0.06,...,2499.0,684.0,2.32,1.5,60.0,7.0,50.489,23.426,3.012,219.0
2,1003,BUTTER OIL ANHYDROUS,0.24,876,0.28,99.48,0.0,0.0,0.0,0.0,...,3069.0,840.0,2.8,1.8,73.0,8.6,61.924,28.732,3.694,256.0
3,1004,CHEESE BLUE,42.41,353,21.4,28.74,5.11,2.34,0.0,0.5,...,721.0,198.0,0.25,0.5,21.0,2.4,18.669,7.778,0.8,75.0
4,1005,CHEESE BRICK,41.11,371,23.24,29.68,3.18,2.79,0.0,0.51,...,1080.0,292.0,0.26,0.5,22.0,2.5,18.764,8.598,0.784,94.0
5,1006,CHEESE BRIE,48.42,334,20.75,27.68,2.7,0.45,0.0,0.45,...,592.0,174.0,0.24,0.5,20.0,2.3,17.41,8.013,0.826,100.0
6,1007,CHEESE CAMEMBERT,51.8,300,19.8,24.26,3.68,0.46,0.0,0.46,...,820.0,241.0,0.21,0.4,18.0,2.0,15.259,7.023,0.724,72.0
7,1008,CHEESE CARAWAY,39.28,376,25.18,29.2,3.28,3.06,0.0,,...,1054.0,271.0,,,,,18.584,8.275,0.83,93.0
8,1009,CHEESE CHEDDAR,37.1,406,24.04,33.82,3.71,1.33,0.0,0.28,...,994.0,263.0,0.78,0.6,24.0,2.9,19.368,8.428,1.433,102.0
9,1010,CHEESE CHESHIRE,37.65,387,23.37,30.6,3.6,4.78,0.0,,...,985.0,233.0,,,,,19.475,8.671,0.87,103.0


#### <p style="color:Gray">Indexing<p/><hr/>
* column labels(column index)
* row labels(row index)

#### <p style="color:Gray">Series<p/><hr/>
A core data structure that pandas uses to represent rows and columns. A Series is a labelled collection of values similar to the NumPy vector. 
* To utilize non-integer labels<br/>
(NumPy arrays can only utilize integer labels for indexing).
* Pandas utilizes this feature to provide more context when returning a row or a column from a dataframe.
* 비정수 레이블을 사용할 수 있습니다.
* 행이나 열을 반환할 때 더 많은 context를 제공합니다. 

#### <p style="color:Gray">Selecting a row<p/><hr/>
#### <p style="color:Gray">loc method<p/>
To use the pandas method loc[] to select rows in a dataframe.
* To select rows by row labels
* Pandas uses zero-indexing, so the first row is at index 0, the second row at index 1

In [18]:
# Series object representing the row at index 0.
food_info.loc[0]

NDB_No                         1001
Shrt_Desc          BUTTER WITH SALT
Water_(g)                     15.87
Energ_Kcal                      717
Protein_(g)                    0.85
Lipid_Tot_(g)                 81.11
Ash_(g)                        2.11
Carbohydrt_(g)                 0.06
Fiber_TD_(g)                      0
Sugar_Tot_(g)                  0.06
Calcium_(mg)                     24
Iron_(mg)                      0.02
Magnesium_(mg)                    2
Phosphorus_(mg)                  24
Potassium_(mg)                   24
Sodium_(mg)                     643
Zinc_(mg)                      0.09
Copper_(mg)                       0
Manganese_(mg)                    0
Selenium_(mcg)                    1
Vit_C_(mg)                        0
Thiamin_(mg)                  0.005
Riboflavin_(mg)               0.034
Niacin_(mg)                   0.042
Vit_B6_(mg)                   0.003
Vit_B12_(mcg)                  0.17
Vit_A_IU                       2499
Vit_A_RAE                   

In [19]:
# Series object representing the seventh row.
food_info.loc[6]

NDB_No                         1007
Shrt_Desc          CHEESE CAMEMBERT
Water_(g)                      51.8
Energ_Kcal                      300
Protein_(g)                    19.8
Lipid_Tot_(g)                 24.26
Ash_(g)                        3.68
Carbohydrt_(g)                 0.46
Fiber_TD_(g)                      0
Sugar_Tot_(g)                  0.46
Calcium_(mg)                    388
Iron_(mg)                      0.33
Magnesium_(mg)                   20
Phosphorus_(mg)                 347
Potassium_(mg)                  187
Sodium_(mg)                     842
Zinc_(mg)                      2.38
Copper_(mg)                   0.021
Manganese_(mg)                0.038
Selenium_(mcg)                 14.5
Vit_C_(mg)                        0
Thiamin_(mg)                  0.028
Riboflavin_(mg)               0.488
Niacin_(mg)                    0.63
Vit_B6_(mg)                   0.227
Vit_B12_(mcg)                   1.3
Vit_A_IU                        820
Vit_A_RAE                   

In [21]:
hundredth_row = food_info.loc[99]
print(hundredth_row)

NDB_No                                  1111
Shrt_Desc          MILK SHAKES THICK VANILLA
Water_(g)                              74.45
Energ_Kcal                               112
Protein_(g)                             3.86
Lipid_Tot_(g)                           3.03
Ash_(g)                                 0.91
Carbohydrt_(g)                         17.75
Fiber_TD_(g)                               0
Sugar_Tot_(g)                          17.75
Calcium_(mg)                             146
Iron_(mg)                                0.1
Magnesium_(mg)                            12
Phosphorus_(mg)                          115
Potassium_(mg)                           183
Sodium_(mg)                               95
Zinc_(mg)                               0.39
Copper_(mg)                            0.051
Manganese_(mg)                         0.014
Selenium_(mcg)                           2.3
Vit_C_(mg)                                 0
Thiamin_(mg)                            0.03
Riboflavin