# Importing and Storing Data
In this section, we will explore how to import comma-separated values (CSV) and Excel (XLSX) files into a DataFrame, as well as how to export DataFrames back to CSV or Excel formats using the Pandas library. We will also review the various parameters available to customize the import and export process effectively.

Since we will be using methods from the Pandas library, we first need to import it using:

In [2]:
import pandas as pd

## read_csv()
The [read_csv()](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) function reads a comma-separated values (CSV) file and loads it into a DataFrame.

- The file path is provided as a string inside the read_csv() function, including the file name and its extension.
- In the following code, the CSV file person_info_v1.csv is located in the sample_files folder.



In [4]:
pd.read_csv('sample_files/person_info_v1.csv')

Unnamed: 0,Name,Birthdate,State,Age,Weight
0,Amy,10/25/1972,NY,34,150
1,Joe,2/5/2010,NJ,21,175
2,Mia,10/23/2015,AZ,67,130
3,Amy,11/6/1983,FL,19,145
4,Liz,4/19/1994,CA,29,150
5,John,8/29/2023,OH,34,170


### index_col
Specifies which column to use as the index of the DataFrame, using the column’s index number or label.

In [6]:
df = pd.read_csv('sample_files/person_info_v1.csv', index_col=0)
df

Unnamed: 0_level_0,Birthdate,State,Age,Weight
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Amy,10/25/1972,NY,34,150
Joe,2/5/2010,NJ,21,175
Mia,10/23/2015,AZ,67,130
Amy,11/6/1983,FL,19,145
Liz,4/19/1994,CA,29,150
John,8/29/2023,OH,34,170


### parse_dates
Specifies columns to be parsed as Timestamp objects during import.

The values in the Date column are currently stored as strings.

In [9]:
df.loc['Joe', 'Birthdate']

'2/5/2010'

In the following code, Date column values imported as a Timestamp object.

In [11]:
df = pd.read_csv('sample_files/person_info_v1.csv', index_col=0, parse_dates=['Birthdate'])
df

Unnamed: 0_level_0,Birthdate,State,Age,Weight
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Amy,1972-10-25,NY,34,150
Joe,2010-02-05,NJ,21,175
Mia,2015-10-23,AZ,67,130
Amy,1983-11-06,FL,19,145
Liz,1994-04-19,CA,29,150
John,2023-08-29,OH,34,170


In [12]:
df.loc['Joe', 'Birthdate']

Timestamp('2010-02-05 00:00:00')

In [13]:
type(df.loc['Joe', 'Birthdate'])

pandas._libs.tslibs.timestamps.Timestamp

### header
The person_info_v2.csv file contains some unnecessary information in certain cells. As a result, importing the entire file may lead to many missing values. 

In [15]:
pd.read_csv('sample_files/person_info_v2.csv')

Unnamed: 0,Name,Unnamed: 1,Person Info,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6
0,Birthdate,,N,D,S,A,W
1,State,,,,,,
2,Age,,,,,,
3,Weight,,,,,,
4,,,Name,Birthdate,State,Age,Weight
5,,,Amy,10/25/1972,NY,34,150
6,,,Joe,2/5/2010,NJ,21,175
7,,,Mia,10/23/2015,AZ,67,130
8,,,Amy,11/6/1983,FL,19,145
9,,,Liz,4/19/1994,CA,29,150


To avoid this, we should import only the relevant data, as demonstrated in the previous section.
- First, the header parameter is used to skip the first 5 rows and treat the sixth row (index 5) as the header.

In [17]:
pd.read_csv('sample_files/person_info_v2.csv',header=5)

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Name,Birthdate,State,Age,Weight
0,,,Amy,10/25/1972,NY,34,150
1,,,Joe,2/5/2010,NJ,21,175
2,,,Mia,10/23/2015,AZ,67,130
3,,,Amy,11/6/1983,FL,19,145
4,,,Liz,4/19/1994,CA,29,150
5,,,John,8/29/2023,OH,34,170


The index_col parameter is used to exclude the first two columns and setting third one (index 2) as the index.

In [19]:
pd.read_csv('sample_files/person_info_v2.csv',header=5, index_col=2)

Unnamed: 0_level_0,Unnamed: 0,Unnamed: 1,Birthdate,State,Age,Weight
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Amy,,,10/25/1972,NY,34,150
Joe,,,2/5/2010,NJ,21,175
Mia,,,10/23/2015,AZ,67,130
Amy,,,11/6/1983,FL,19,145
Liz,,,4/19/1994,CA,29,150
John,,,8/29/2023,OH,34,170


The iloc indexer can be used to exclude the first two unnecessary columns by selecting only the desired columns based on their integer positions.

In [21]:
pd.read_csv('sample_files/person_info_v2.csv',header=5, index_col=2).iloc[:,2:]

Unnamed: 0_level_0,Birthdate,State,Age,Weight
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Amy,10/25/1972,NY,34,150
Joe,2/5/2010,NJ,21,175
Mia,10/23/2015,AZ,67,130
Amy,11/6/1983,FL,19,145
Liz,4/19/1994,CA,29,150
John,8/29/2023,OH,34,170


### usecols
This parameter is used to specify which columns should be imported from the file.

In [23]:
pd.read_csv('sample_files/person_info_v2.csv',header=5, index_col=0, 
            usecols=['Name', 'Birthdate', 'State', 'Age', 'Weight'])

Unnamed: 0_level_0,Birthdate,State,Age,Weight
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Amy,10/25/1972,NY,34,150
Joe,2/5/2010,NJ,21,175
Mia,10/23/2015,AZ,67,130
Amy,11/6/1983,FL,19,145
Liz,4/19/1994,CA,29,150
John,8/29/2023,OH,34,170


## read_excel()
It is similar to the read_csv() method but is used to import data from Excel files.

## to_csv()
The [to_csv()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html) method is used to write a DataFrame to a comma-separated values (CSV) file.

The file path is passed as a string to the to_csv() function, including the file name and its extension.

In the following code, the DataFrame df is saved as a CSV file named countries.csv inside the sample_files folder.

In [26]:
df = pd.DataFrame( index=['France', 'Germany', 'North America', 'Japan'],
                 columns=['Continent', 'Company'],
                 data = [['Europe', 'Carrefour'], ['Europe', 'Siemens'],
                         ['USA', 'Apple'],
                         ['Asia', 'Sony']])
df

Unnamed: 0,Continent,Company
France,Europe,Carrefour
Germany,Europe,Siemens
North America,USA,Apple
Japan,Asia,Sony


In [27]:
df.to_csv('sample_files/countries.csv')

![A screenshot of an Excel file containing index, 2 columns and 4 rows of data.](pict/csv1.png)

### index
Set the index parameter False to prevent the DataFrame’s index from being written to the output file.

In [30]:
df.to_csv('sample_files/countries.csv', index=False)

![A screenshot of an Excel file containing 2 columns and 4 rows of data.](pict/csv2.png)

### columns
Specifies which columns of the DataFrame should be written to the output file.

In [33]:
df.to_csv('sample_files/countries.csv', columns=['Company'])

![A screenshot of an Excel file containing 2 columns and 4 rows of data.](pict/csv3.png)

### header
Specifies whether to write column labels to the output file. It can also accept a list of custom labels.

In [36]:
df.to_csv('sample_files/countries.csv', header=False)

![A screenshot of an Excel file containing 2 columns and 4 rows of data.](pict/csv4.png)

In [62]:
df.to_csv('sample_files/countries.csv', header=['CONTINENT', 'COMPANY'])

![A screenshot of an Excel file containing 2 columns and 4 rows of data.](pict/csv5.png)

## to_excel()
It is similar to the to_csv() method but is used to save data as an Excel files.