# Creating, Reading and Writing

## Getting Started

To use pandas, you'll typically start with the following line of code:

In [None]:
import pandas as pd

## Creating Data

There are two core objects in pandas: the **DataFrame** and the **Series**.

### DataFrame

A *DataFrame* is a table. It contains an array of individuals entries, each of which has a certain value. Each entry corresponds to a row (or record) and a column.
For example, consider the following simple DataFrame:

In [None]:
pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

Unnamed: 0,Yes,No
0,50,131
1,21,2


DataFrame entries are not limited to integers. For instance, here's a DataFrame whose values are strings:

In [3]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})

Unnamed: 0,Bob,Sue
0,I liked it.,Pretty good.
1,It was awful.,Bland.


We are using the ```pd.DataFrame()``` constructor to generate these DataFrame objects. The syntax for declaring a new one is a dictionary whose keys are the column names (Bob and Sue in this example), and whose values are a list of entries. This is the standard way of constructing a new DataFrame, and the one you are most likely to encounter.
The dictionary-list constructor assigns values to the *columnn labels*, but just uses an ascending count from 0 (0, 1, 2, 3, ...) for the *row labels*.
The list of row labels used in a DataFrame is known as an **Index**. We can assign values to it by using an *index* parameter in our constructor:

In [4]:
product_reviews = pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']}, index=['Product A', 'Product B'])

We can also create a CSV file from this DataFrame by using the ```to_csv()``` method:

In [5]:
product_reviews.to_csv('product_reviews.csv')

### Series

A *Series*, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list:

In [6]:
pd.Series([1, 2, 3, 4, 5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

A Series is, in essence, a single column of a DataFrame. So you can assign row labels to the Series the same way as before, using an index parameter. However, a Series does not have a colum name, it only has one overall name:

In [7]:
pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

2015 Sales    30
2016 Sales    35
2017 Sales    40
Name: Product A, dtype: int64

The Series and the DataFrame are intimately related. It's helpful to think of a DataFrame as actually being just a bunch of Series "glued together".

## Reading Data Files

Data can be stored in any of a number of different forms and formats. By far the most basic of these is the humble CSV file. When you open a CSV file you get something that looks like this:

<code>
Product A, Product B, Product C, 
30, 21, 9,
35, 34, 1,
41, 11, 11
</code>

So a CSV file is a table of values separated by commas. Hence the name: "Comma-Separated Values", or CSV.
Let's now see what a real dataset looks like when we read it into a DataFrame. We'll use ```python pd.read_csv()``` function to read the data into a DataFrame.

In [8]:
br_small_caps = pd.read_csv('statusinvest-busca-avancada.csv', delimiter=';')

Here we're also using the parameter ```delimiter``` to specify the delimiter used by the CSV file.
We can use the ```shape``` attribute to check how large the resulting DataFrame is:

In [9]:
br_small_caps.shape

(56, 30)

So our new DataFrame has 56 records split across 30 columns. That's 1680 entries (cells)!
We can examine the contents of the resultant DataFrame using the ```head()``` command, which grabs the first five rows:

In [10]:
br_small_caps.head()

Unnamed: 0,TICKER,PRECO,DY,P/L,P/VP,P/ATIVOS,MARGEM BRUTA,MARGEM EBIT,MARG. LIQUIDA,P/EBIT,...,PATRIMONIO / ATIVOS,PASSIVOS / ATIVOS,GIRO ATIVOS,CAGR RECEITAS 5 ANOS,CAGR LUCROS 5 ANOS,LIQUIDEZ MEDIA DIARIA,VPA,LPA,PEG Ratio,VALOR DE MERCADO
0,AGRO3,27.55,11.66,11.95,1.45,0.85,27.98,15.46,21.58,16.68,...,0.59,0.41,0.33,27.22,13.38,6.609.378.40,19.06,2.31,0.04,2.828.928.882.20
1,ATOM3,2.03,,2.5,1.23,0.91,91.76,-3.39,84.24,-62.11,...,0.74,0.21,0.43,31.18,22.25,13.557.37,1.65,0.81,0.01,48.323.942.94
2,BLAU3,11.29,2.48,8.51,1.0,0.64,34.09,22.18,16.14,6.19,...,0.64,0.36,0.47,11.91,14.06,2.278.661.06,11.26,1.33,-0.26,2.025.357.571.31
3,BOAS3,7.95,,15.22,1.83,1.7,56.53,19.25,32.39,25.6,...,0.93,0.07,0.34,8.81,74.34,53.310.276.50,4.34,0.52,5.59,4.212.250.617.75
4,BRBI11,14.5,9.1,8.88,1.84,0.14,100.0,8.88,61.44,61.47,...,0.07,0.93,0.02,-16.03,27.88,3.910.827.37,7.87,1.63,0.4,1.522.437.708.00
