## Introduction

In this micro-course, We'll learn all about pandas, the most popular Python library for data analysis.<br>

Along the way, We'll complete several hands-on exercises with real-world data. We recommend that you work on the exercises while reading the corresponding tutorial. 

* In this tutorial...
    * we'll learn how to create own data
        * along with how to work with data that already exists.

##  Getting started
To use pandas, you'll typically start with the following line of code.

In [1]:
import pandas as pd

## Creating data

There are two core objects in pandas: <br>
* DataFrame
* Series

### DataFrame
A DataFrame is a table. <br>
It contains an array of individual entries, each of which has a certain value. Each entry corresponds to a row(or record) and a column.<br>

For example, consider the following simple DataFrame

In [2]:
pd.DataFrame({'Yes': [50, 21], 'No':[131,2]})

Unnamed: 0,Yes,No
0,50,131
1,21,2


DataFrame entries are not limited to integers. For instance, here's DataFrame whose values are strings:

In [3]:
pd.DataFrame({'Bob': ['I liked it', 'It was awful'], 'sue': ['Pretty good', 'Bland']})

Unnamed: 0,Bob,sue
0,I liked it,Pretty good
1,It was awful,Bland


We are using the <code>pd.DataFrame()</code> constructor to generate these DataFrame objects. 

The syntax for declaring a new one is a dictionary whose keys are the column names, and whose values are a list of entries.

This is the standard way of constructing a new DataFrame, and the one you are most likely to encounter.

The list of row labels used in a DataFrame is known as an **Index**. We can assign values to it by using an <code>index</code> parameter in our constructor.

In [4]:
pd.DataFrame({'Bob': ['I liked it', 'It was awful'], 
              'sue': ['Pretty good', 'Bland']},
            index={'Product A', 'Product B'})

Unnamed: 0,Bob,sue
Product A,I liked it,Pretty good
Product B,It was awful,Bland


### Series

A <code>Series</code>, is a sequence of data values.<br>
If a DataFrame is a table, a Series is a list. <br>
And in fact you can create one with nothing more than a list:

In [5]:
pd.Series([1, 2, 3, 4, 5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

A Series is, in essence, a single column of a DataFrame. So you can assign column values to the Series the same way as before, using an index parameter. However, a Series does not have a column name, it only has one overall name:

In [7]:
pd.Series([30, 35, 40],
         index=['2015 Sales', '2016 Sales', '2017 Sales'],
         name='Product A')

2015 Sales    30
2016 Sales    35
2017 Sales    40
Name: Product A, dtype: int64

The Series and the DataFrame are intimately related. It's helpful to think of a DataFrame as actually being just a bunch of Series "glued together". We'll see more of this in the next section of this tutorial.

* Summary
    * DataFrame과 Series는 상당히 연관성이 많다.
    * Series를 붙여논게 DataFrame라고 생각하자!
    * DataFrame == Table

## Reading data files

Being able to create a DataFrame or Series by hand is handy. But, most of the time, we won't actually be creating our own data by hand. Instead, we'll be working with data that already exists.

<code>pd.read_csv()</code>

The pd.read_csv() function is well-endowed, with over 30 optional parameters you can specify. 

For example, you can see in this dataset that the CSV file has a built-in index, which pandas did not pick up on automatically. 
To make pandas use that column for the index (instead of creating a new one from scratch), we can specify an index_col.

## Reference

https://www.kaggle.com/residentmario/creating-reading-and-writing

## 용어정리
* Entry: 항목
* 