# CSV

> A [**comma-separarated-values (CSV)**](https://en.wikipedia.org/wiki/Comma-separated_values) file is a delimited text file that uses a comma to separate values.
> 
> Each line of the file is a data record.
> 
> Each record consists of one or more fields, separated by commas.

Load CSV data with a single row per document.

In [1]:
from langchain_community.document_loaders.csv_loader import CSVLoader

In [2]:
loader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv')
data = loader.load()

In [3]:
type(data)

list

In [12]:
data[:3]

[Document(page_content='MLB Team: Team\nPayroll in millions: "Payroll (millions)"\nWins: "Wins"', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 0}),
 Document(page_content='MLB Team: Nationals\nPayroll in millions: 81.34\nWins: 98', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 1}),
 Document(page_content='MLB Team: Reds\nPayroll in millions: 82.20\nWins: 97', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 2})]

# Customizing the CSV Parsing and Loading

See the [**csv module documentation**](https://docs.python.org/3/library/csv.html) for more information of what csv args are supported.

In [6]:
loader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv', csv_args={
    'delimiter': ',',
    'quotechar': '"',
    'fieldnames': ['MLB Team', 'Payroll in millions', 'Wins']
})

In [7]:
type(loader)

langchain_community.document_loaders.csv_loader.CSVLoader

In [8]:
loader

<langchain_community.document_loaders.csv_loader.CSVLoader at 0x7ff07e0e6e90>

In [9]:
data = loader.load()

In [11]:
data[:3]

[Document(page_content='MLB Team: Team\nPayroll in millions: "Payroll (millions)"\nWins: "Wins"', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 0}),
 Document(page_content='MLB Team: Nationals\nPayroll in millions: 81.34\nWins: 98', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 1}),
 Document(page_content='MLB Team: Reds\nPayroll in millions: 82.20\nWins: 97', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 2})]

# Specify a Column to Identify The Documents Source

Use the `source_column`argument to specify a source for the document created from each row.

Otherwise, `file_path` will used **as the source for all documents created from the CSV file**.

This is useful when using documents loaded from CSV files **for chains that answer questions using sources**.



In [13]:
loader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv', source_column="Team")

data = loader.load()

In [14]:
data

[Document(page_content='Team: Nationals\n"Payroll (millions)": 81.34\n"Wins": 98', metadata={'source': 'Nationals', 'row': 0}),
 Document(page_content='Team: Reds\n"Payroll (millions)": 82.20\n"Wins": 97', metadata={'source': 'Reds', 'row': 1}),
 Document(page_content='Team: Yankees\n"Payroll (millions)": 197.96\n"Wins": 95', metadata={'source': 'Yankees', 'row': 2}),
 Document(page_content='Team: Giants\n"Payroll (millions)": 117.62\n"Wins": 94', metadata={'source': 'Giants', 'row': 3}),
 Document(page_content='Team: Braves\n"Payroll (millions)": 83.31\n"Wins": 94', metadata={'source': 'Braves', 'row': 4}),
 Document(page_content='Team: Athletics\n"Payroll (millions)": 55.37\n"Wins": 94', metadata={'source': 'Athletics', 'row': 5}),
 Document(page_content='Team: Rangers\n"Payroll (millions)": 120.51\n"Wins": 93', metadata={'source': 'Rangers', 'row': 6}),
 Document(page_content='Team: Orioles\n"Payroll (millions)": 81.43\n"Wins": 93', metadata={'source': 'Orioles', 'row': 7}),
 Docume