<img src="../../predictioNN_Logo_JPG(72).jpg" width=200>

---

## Input/Output


### Introduction to Data Science
### Last Updated: November 14, 2022
---  

### SOURCES 


- pandas read_csv()  
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html


### OBJECTIVES
- Understand how to set a path to a file
- Understand how to change the path
- Introduce data formats: text, csv
- Show how to work with the data formats: read, write
- Introduce how to get help with functions

### CONCEPTS

- file path
- text file
- csv
- delimiter
- `with`, `open()`, `close()`

---


## Working with Paths

How does Python know where to look for things?  
It uses a search path. This path can be retrieved and changed.
The module `os` can help.

In [None]:
import os

Get the current working directory, which will vary for each computer.  

In [None]:
os.getcwd()

Note: Windows operating systems will use backslashes `\` while Mac will use forward slashes `/`

If the file is in the current working directory, then we don't need to give a path.

**Here's an example of reading in data from a file** (don't worry about the function specifics, they will be explained later):

In [None]:
f1 = open('data_example.txt','r') # we can simply enter the name of the file, without its path
data = f1.read()                  # read file content
print(data)

---

Next, we would like to load a file named `data_example_test.csv` but it's in a different folder called `datasets`  
Watch what happens if we don't set the path:

In [None]:
f1 = open('data_example_test.csv','r')
data = f1.read()
print(data)

To navigate to this folder, we need to:
1) back up one folder (we can use `../` for this)  
2) look in the `datasets` folder (`../datasets`)

Note: backing up two folders looks like `../..` and so on.

OK now let's put things together:

In [None]:
f1 = open('../datasets/data_example_test.csv','r')
data = f1.read()
print(data)

**It worked!** Using pathing like this to navigate around is an important skill.

We will see this again throughout the course.

Next we study different data file formats and how to read from them and write to them.

## Text File Format 

- text files contain textual data (absent images)   
- can be saved in plain text or rich text formats
- typical extensions: txt, rtf, log, doc, docx (where doc, docx are proprietary to Microsoft)

## Text in Python

**Read in text file** using `open()`, print the data, and close the file:

In [None]:
f1 = open('../datasets/data_example.txt','r') # 'r' for read mode
data = f1.read()                  # read file content
print(data)
f1.close()

**Using `with open()` is preferred**, as the file will be closed, even in the event of an error

In [None]:
with open('../datasets/data_example.txt', "r") as f:
    data = f.read()
    print(data)

# check if file is closed
print('\nFile closed? \n', f.closed)

**Writing to text file**

`open()` can be used again, in mode 'a' for append or 'w' for write.

In [None]:
# append to the file
with open('../datasets/data_example.txt', "a") as f:
    f.write('\n' + 'another line')

**Aside 1: Context Manager**  
`with` command is called a *context manager*.   
The context manager sets up a temporary context, and destructs the context after the operations are completed.  
Here, it does housekeeping of opening, closing file.

**Aside 2: Getting help on functions**

We are seeing examples of functions such as `print`

You can learn more about the functions by asking for help:

In [None]:
help(print)

In [None]:
?print

---

## CSV Format

A *comma separated value* (CSV) file is a plain text file containing rows of data separated by a character, generally commas.  
A header row containing column (field) names may be included. It's often in the first row.  

Example:
```
name,email,phone
laura palmer,lpalmer@twin_peaks,123-456-7890,
agent dale cooper,dcooper@twin_peaks,123-454-7899
```

CSV format is very popular, but using comma as separator (delimiter) can be problematic:   
if data itself contains commas, delimiter won't work properly.

Popular workaround: enclose text with commas in quotes...works until data contains commas and quotes!  
Leads to alternative delimiters such as pipe `|` which less commonly appears in data.


## CSV in Python

**Read data in csv format:**

Here we use **pandas** `read_csv()` to read data in csv format.  
This is the most common method for reading this format.

**NOTE**  
pandas is an essential package for working with data frames (abbreviated: `df`).  
We will have more to say about `pandas` later; will be used throughout the course.  

Some important parameters:
- *filepath*: the full path to the dataset including filename
- *delimiter* or *sep*: the field delimiter. default is comma.
- *header*: row number containing column names


[Details](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)

In [None]:
import pandas as pd

# data source: https://archive.ics.uci.edu/ml/datasets/Wine

chem = pd.read_csv('../datasets/data_example_chem.csv')

#### Show the first few rows of data (and the header)

In [None]:
chem.head()

**Take a random sample of data:**

In [None]:
chem.sample(5)

**What is the data type of `chem`?**

In [None]:
type(chem)

**Check if `chem` is a pandas dataframe:**

In [None]:
isinstance(chem, pd.DataFrame)

The data now lives in a pandas dataframe for a wide range of work.  
We will do a lot of pandas work in the course.

**Which columns are in`chem`?**

In [None]:
chem.columns

**Write the first two rows to a new csv file:**

In [None]:
chem.head(2).to_csv('../datasets/data_example_chem_first_two_rows.csv')

**TRY FOR YOURSELF**  

CSV Exercise

1a) Read in a dataset from this URL:  
'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/bezdekIris.data'

note: this URL can be directly passed to read_csv()

1b) You will notice the first record comes in as a header row.
Pass a parameter to read_csv() so there is no header.

1c) Write the data to a file in datasets folder with .txt extension and with pipe separator |

In [None]:
import pandas as pd

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/bezdekIris.data'
iris = pd.read_csv(url, header=None)
iris
iris.to_csv('../datasets/some_data.txt',sep='|')

---