# Data Loading, Storage, and File Formats

Accessing data and getting it into your script is the first step of any data project. Here it is covered several ways of importing data from files and other sources into a python application, as weel as ways to export data to files. 

Input and Output typically falls into a few main categories: rerading text files and other more efficient on-disk format, loading data from databases, and interacting with network sources like web APIs.

### Content
- 1. Reading and writing data in text formats.
    - Handling Plain text.
    - Getting Tabular data from a text file.
- 2. Loading data from databases.
- 3. Interacting with network sources like web APIs.

In [8]:
import pandas as pd

## 1. Reading and writing data in text formats

### Handling Plain text 

Text can arrive in different formats. Sometimes plain text `(.txt)` is of interest. Text files are perhaps the most common file type you'll encounter. Luckily, Python doesn't require a special library for processing it; you can simply rely on the methods of the file object returned by the `*open( )*` function. 

To Python, a text file is a sequence of string objects, each of those is one line of text file - that is, a sequence of character ending in a nondisplayed new line character `(\n)` or hard return. 

Please take a look to the following example. To humans, the passage consist of four paragraphs that includes several sentences; however, to Python, the passage includes four nonempty lines and three blank lines between them. 

In [5]:
# Reading a text file in Python 

path = r"C:\Users\jober\OneDrive\Desktop\Data Science\Data Science - Study notes\Data_used\Marine.txt"

with open(path, 'r') as f:  # The first artgument 'path' specify where the file is located, the second controls how the file will be used, 'r' for read only.
    content = f.read()      # The read method reads the entire content of the file object.
print(content)

# the with keyword is used to ensure that the file object is properly closed when the action is performed. 

The Marine Mammal Center (TMMC) is a private, non-profit U.S. organization that was established in 1975 for the purpose of rescuing, rehabilitating and releasing marine mammals who are injured, ill or abandoned. It was founded in Sausalito, California, by Lloyd Smalley, Pat Arrigoni and Paul Maxwell. Since 1975, TMMC has rescued over 24,000 marine mammals. 

It also serves as a center for environmental research and education regarding marine mammals, namely cetaceans (whales, dolphins and porpoises), pinnipeds (seals, fur seals, walruses and sea lions), otters and sirenians (manatees and dugongs). 

Marine mammal abandonment refers to maternal separation; pups that have been separated from their mother before weaning. At the center, they receive specialized veterinary care: they are diagnosed, treated, rehabilitated and ideally, released back into the wild. 

Animals in need of assistance are usually identified by a member of the public who has contacted the center. These animals repre

Rather than print the lines, you can send them to a list using a list comprenhension:

In [7]:
# Creating a list from a text file in Python 

path = r"C:\Users\jober\OneDrive\Desktop\Data Science\Data Science - Study notes\Data_used\Marine.txt"

with open(path, 'r') as f:  
    lst = [line.strip() for line in f if line.strip()]

lst

['The Marine Mammal Center (TMMC) is a private, non-profit U.S. organization that was established in 1975 for the purpose of rescuing, rehabilitating and releasing marine mammals who are injured, ill or abandoned. It was founded in Sausalito, California, by Lloyd Smalley, Pat Arrigoni and Paul Maxwell. Since 1975, TMMC has rescued over 24,000 marine mammals.',
 'It also serves as a center for environmental research and education regarding marine mammals, namely cetaceans (whales, dolphins and porpoises), pinnipeds (seals, fur seals, walruses and sea lions), otters and sirenians (manatees and dugongs).',
 'Marine mammal abandonment refers to maternal separation; pups that have been separated from their mother before weaning. At the center, they receive specialized veterinary care: they are diagnosed, treated, rehabilitated and ideally, released back into the wild.',
 'Animals in need of assistance are usually identified by a member of the public who has contacted the center. These anima

From this point onwards, you can use any method available for processing strings in Python, so you canorganize it a tabular form. 

### Getting Tabular data from a text file

Tabular data is easy to manage with `pandas`. Pandas features a number of functions for reading tabular data as a DataFrame object. Most of the parsing functions in pandas use a character as a delimiter between  columns. However, in some cases, a table might not have a fixed delimiter, using whitespace or some other pattern to separate fields. 

In [10]:
# Example of how to use a space as delimiter between columns. 
path = r"C:\Users\jober\OneDrive\Desktop\Data Science\Data Science - Study notes\Data_used\Price_table.txt"
table = pd.read_table(path, sep='\s+') # The term (\s+) is a regular expression used to indicates the elements after a space.
table

Unnamed: 0,MemStartDate,TotalPrice,UnitPrice
0,2007-07-13,50.5,5.5
1,2006-01-13,10.4,1.4
2,2010-08-13,3.5,0.5
