# Creating, reading, and writing reference

The first step in any data analytics project is setting up the data. In this lesson, you'll look at exercises on creating `Series` and `DataFrame` objects, both by hand and by reading data from disc.

First you'll import pandas in a very conventional way:

In [None]:
import pandas as pd

## Creating data

There are two core objects in `pandas`: the **DataFrame** and the **Series**.

A DataFrame is a table. It contains an array of individual *entries*, each of which has a certain *value*. Each entry corresponds with a row (or *record*) and a *column*.

For example, consider the following simple `DataFrame`:

In [None]:
pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

In this example, the "0, No" entry has the value of 131. The "0, Yes" entry has a value of 50, and so on.

`DataFrame` entries are not limited to integers. For instance, here's a `DataFrame` whose values are `str` strings:

In [None]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})

We are using the `pd.DataFrame` constructor to generate these `DataFrame` objects. The syntax for declaring a new one is a dictionary whose keys are the column names (`Bob` and `Sue` in this example), and whose values are a list of entries. This is the standard way of constructing a new `DataFrame`, and the one you are likliest to encounter.

The dictionary-list constructor assigns values to the *column labels*, but just uses an ascending count from 0 (0, 1, 2, 3, ...) for the *row labels*. Sometimes this is OK, but oftentimes we will want to assign these labels ourselves.

The list of row labels used in a `DataFrame` is known as an **Index**. We can assign values to it by using an `index` parameter in our constructor:

In [None]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
              'Sue': ['Pretty good.', 'Bland.']},
             index=['Product A', 'Product B'])

A Series, by contrast, is a sequence of data values. If a `DataFrame` is a table, a `Series` is a list. And in fact you can create one with nothing more than a list:

In [None]:
pd.Series([1, 2, 3, 4, 5])

A `Series` is, in essence, a single column of a `DataFrame`. So you can assign column values to the `Series` the same way as before, using an `index` parameter. However, a `Series` do not have a column name, it only has one overall `name`:

In [None]:
pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

`Series` and the `DataFrame` are intimately related. It's helpful to think of a `DataFrame` as actually being just a bunch of `Series` "glue together". We'll see more of this in the next section of this tutorial.

## Reading Data

Being able to create a `DataFrame` and `Series` by hand is handy. But we usually read existing data rather than creating it.

By far the most standard format for storing and sharing data is the humble CSV file. If you open a CSV file in a text editing program, it looks like this:

```csv
Product A,Product B,Product C,
30,21,9,
35,34,1,
41,11,11
```

It is just a table of values separated by commas. Hence the name: "comma-seperated values", or CSV.

You'll use the `read_csv` function to read the data into a `DataFrame`.

In [None]:
wine_reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv")

The `shape` attribute shows how large the resulting `DataFrame` is:

In [None]:
wine_reviews.shape

The `DataFrame` has 130,000 records and 14 columns. That's almost 2 million entries!

You can examine the contents of the resultant `DataFrame` using the `head` command, which grabs the first five rows:

In [None]:
wine_reviews.head()

The `pandas` `read_csv` function has over 30 optional parameters you can specify. For example, you can see in this dataset that the `csv` file has an in-built index. To make `pandas` use that column for the index (instead of creating a new one from scratch), specify an `index_col`.

In [None]:
wine_reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
wine_reviews.head()

## Writing Data

The opposite of `read_csv`, which reads our data, is `to_csv`, which writes it. With CSV files it's dead simple:

In [None]:
wine_reviews.head().to_csv("wine_reviews.csv")

Painless!