# [Creating, reading, and writing reference](https://www.kaggle.com/residentmario/creating-reading-and-writing-reference)

This is the reference component to the "Creating, reading, and writing" section of the tutorial.  
For the workbook section, [click here](https://www.kaggle.com/residentmario/creating-reading-and-writing-workbook).  
The very first step in any data analytics project will probably be reading the data out of a file somewhere, so that's where we'll begin.  
In this section, we'll look at exercises on creating `pandas` `Series` and `DataFrame` objects, both by hand and by reading data from storage.  
The [IO Tools section](http://pandas.pydata.org/pandas-docs/stable/io.html) of the official `pandas` docs provides a comprehensive overview on this subject.

In [9]:
import pandas as pd 

### Creating data

There are two core objects in `pandas`:  
the **`DataFrame`** and the **`Series`**.  

A DataFrame is a table.  
It contains an array of individual entries, each of which has a certain value.  
Each entry corresponds with a row (or record) and a column.  
For example, consider the following simple DataFrame:

In [10]:
pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

Unnamed: 0,No,Yes
0,131,50
1,2,21


`DataFrame` entries aren't limited to integers.  
Here's a `DataFrame` with strings as values:

In [11]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})

Unnamed: 0,Bob,Sue
0,I liked it.,Pretty good.
1,It was awful.,Bland.


We are using the `pd.DataFrame` constructor to generate these `DataFrame` objects.  
The syntax for declaring a new one is a dictionary whose keys are the column names (Bob and Sue in this example), and whose values are a list of entries.  
This is the standard way of constructing a new `DataFrame`, and the one you will usually encounter.  
The dictionary-list constructor assigns values to the column labels, but just uses an ascending count from 0 (0, 1, 2, 3, ...) for the row labels.  
Sometimes this is OK, but oftentimes we will want to assign these labels ourselves.  
The list of row labels used in a `DataFrame` is known as an **`Index`**.  
We can assign values to it by using an index parameter in our constructor:

In [12]:
pd.DataFrame({'Bob': ['I loved it.', 'I hated it.'],
              'Sue': ['That was okay.', 'That was not okay.']},
            index=['Product A', 'Product B'])

Unnamed: 0,Bob,Sue
Product A,I loved it.,That was okay.
Product B,I hated it.,That was not okay.


A `Series` is a sequence of values.  
If a `DataFrame` is a table (or a matrix), then a `Series` is a list (or a vector).  
In fact, you can create a `Series` with nothing more than a list:

In [13]:
pd.Series([1, 2, 3, 4, 5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

Think of a `Series` as a single column in a `DataFrame`.  
You can assign column values to the `Series` (same as above) using the `index` parameter.  
In a `Series`, the columns aren't named; the `Series` itself has one overall `name` that can be assigned:

In [14]:
pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

2015 Sales    30
2016 Sales    35
2017 Sales    40
Name: Product A, dtype: int64

Now that you can see the relationship between the `Series` and the `DataFrame`, it's time to move on.

### Reading common file formats