# Reading data files from different sources and Writing data to different file formats

Although it's great that we can create Series and DataFrame objects with our custom data, in real world we will be mostly working with data that already exists. Also after cleaning up a data file we may like to export the cleaned up data to another file for future uses. This is why knowing how to read and write data files in pandas is very important. 

In [1]:
# import statements
import numpy as np
import pandas as pd

--------------------
## Reading Data
--------------------

#### CSV/TSV files (and other similar filetypes)

In [2]:
# pd.read_csv(filepath, sep, delimiter=None, index_col=None, dtype=None, na_values=None)

# filepath: can be any valid string input defing the path to be file like object i.e, an object that has a read() method. valid url can also be passed.
# sep: separator (e.g. for tsv files, sep='\t')
# delimiter: alias for separator
# index_col: int, str, sequence of int / str. Column(s) to use as the row labels of the DataFrame, either given as string name or column index. If a sequence of int / str is given, a MultiIndex is used.
# dtype: data type of the values
# na_values: additional strings to recognize as NA/NaN

This can also read zip files containing only a single csv/tsv file without the need of extracting. But, if there's multiple files in the zip file then it must be unzipped before use.

In [3]:
df_vehicles = pd.read_csv("Data/vehicles.csv.zip")

  df_vehicles = pd.read_csv("Data/vehicles.csv.zip")


In [4]:
# see the first 5 rows of data
df_vehicles.head()

Unnamed: 0,barrels08,barrelsA08,charge120,charge240,city08,city08U,cityA08,cityA08U,cityCD,cityE,...,mfrCode,c240Dscr,charge240b,c240bDscr,createdOn,modifiedOn,startStop,phevCity,phevHwy,phevComb
0,15.695714,0.0,0.0,0.0,19,0.0,0,0.0,0.0,0.0,...,,,0.0,,Tue Jan 01 00:00:00 EST 2013,Tue Jan 01 00:00:00 EST 2013,,0,0,0
1,29.964545,0.0,0.0,0.0,9,0.0,0,0.0,0.0,0.0,...,,,0.0,,Tue Jan 01 00:00:00 EST 2013,Tue Jan 01 00:00:00 EST 2013,,0,0,0
2,12.207778,0.0,0.0,0.0,23,0.0,0,0.0,0.0,0.0,...,,,0.0,,Tue Jan 01 00:00:00 EST 2013,Tue Jan 01 00:00:00 EST 2013,,0,0,0
3,29.964545,0.0,0.0,0.0,10,0.0,0,0.0,0.0,0.0,...,,,0.0,,Tue Jan 01 00:00:00 EST 2013,Tue Jan 01 00:00:00 EST 2013,,0,0,0
4,17.347895,0.0,0.0,0.0,17,0.0,0,0.0,0.0,0.0,...,,,0.0,,Tue Jan 01 00:00:00 EST 2013,Tue Jan 01 00:00:00 EST 2013,,0,0,0


--------------------
## Write Data to Files
--------------------

#### CSV/TSV files (and other similar filetypes)

In [5]:
df_vehicles.to_csv?

[0;31mSignature:[0m
[0mdf_vehicles[0m[0;34m.[0m[0mto_csv[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mpath_or_buf[0m[0;34m:[0m [0;34m'FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None'[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msep[0m[0;34m:[0m [0;34m'str'[0m [0;34m=[0m [0;34m','[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mna_rep[0m[0;34m:[0m [0;34m'str'[0m [0;34m=[0m [0;34m''[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfloat_format[0m[0;34m:[0m [0;34m'str | Callable | None'[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcolumns[0m[0;34m:[0m [0;34m'Sequence[Hashable] | None'[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mheader[0m[0;34m:[0m [0;34m'bool_t | list[str]'[0m [0;34m=[0m [0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mindex[0m[0;34m:[0m [0;34m'bool_t'[0m [0;34m=[0m [0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [