## CSV to Pandas Data Frame

Let us see how we can create **Pandas Data Frames** using data from files. `read_csv` is the most popular API to create a Data Frame by reading data from files.
* Here are some of the important options.
  * sep or delimiter
  * header or names
  * index_col
  * dtype
  * and many more
* We have several other APIs which will facilitate us to create Data Frame
  * read_fwf
  * read_table
  * pandas.io.json
  * and more
* Here is how we can create a Data Frame for orders dataset.
  * Delimiter in our data is **,** which is default for Pandas `read_csv`.
  * There is no Header and hence we have to set keyword argument `header` to None.
  * We can pass the column names as a list using keyword argument `columns`.
  * Data types of each column are typically inferred based on the data, however we can explicitly specify Data Types using `dtype`.
  
```{note}
We will be running this notebook from other notebooks to create orders and order_items data frames while exploring Pandas libraries. 

Make sure you comment out all the informational lines, so that output is not printed when we invoke this notebook from other notebooks.
```

In [2]:
import pandas as pd

In [3]:
# pd.read_csv?

In [4]:
%%sh

# ls -ltr /data/retail_db/orders/part-00000

In [5]:
%%sh

# tail /data/retail_db/orders/part-00000

In [6]:
%%sh

# head /data/retail_db/orders/part-00000

In [7]:
orders_path = "/data/retail_db/orders/part-00000"

In [8]:
orders_schema = [
    "order_id",
    "order_date",
    "order_customer_id",
    "order_status"
]

In [9]:
orders = pd.read_csv(orders_path,
                     delimiter=',',
                     header=None,
                     names=orders_schema
                    )

In [10]:
 #orders

In [11]:
# orders.head(10)

In [12]:
order_items_path = "/data/retail_db/order_items/part-00000"

In [13]:
%%sh

# ls -ltr /data/retail_db/order_items/part-00000

In [14]:
%%sh

# tail /data/retail_db/order_items/part-00000

In [15]:
%%sh

# head /data/retail_db/order_items/part-00000

In [16]:
order_items_schema = [
    "order_item_id",
    "order_item_order_id",
    "order_item_product_id",
    "order_item_quantity",
    "order_item_subtotal",
    "order_item_product_price"
]

In [17]:
order_items = pd.read_csv(order_items_path,
                     delimiter=',',
                     header=None,
                     names=order_items_schema
                    )

In [18]:
order_items.dtypes


order_item_id                 int64
order_item_order_id           int64
order_item_product_id         int64
order_item_quantity           int64
order_item_subtotal         float64
order_item_product_price    float64
dtype: object

In [19]:
 order_items.head(10)

Unnamed: 0,order_item_id,order_item_order_id,order_item_product_id,order_item_quantity,order_item_subtotal,order_item_product_price
0,1,1,957,1,299.98,299.98
1,2,2,1073,1,199.99,199.99
2,3,2,502,5,250.0,50.0
3,4,2,403,1,129.99,129.99
4,5,4,897,2,49.98,24.99
5,6,4,365,5,299.95,59.99
6,7,4,502,3,150.0,50.0
7,8,4,1014,4,199.92,49.98
8,9,5,957,1,299.98,299.98
9,10,5,365,5,299.95,59.99
