# Input and Output in Python
Inputting data into a program can be the most time intensive activity. Python contains a number of packages designed to import and export data with minimum effort and code. I have included some simple examples below. The most popular package (*or at least the most ubiquitous*) these days is `pandas`. (I have included a pandas cheat sheet in the `main` repository). `Pandas` is a more advanced topic that will be covered later. Right now, all you need to know is that `pandas`introduces a new datatype called `dataframes`. They are powerful and flexible tools for data wrangling.

## Details for Navigating File Directories
**Details:** You should note that a single backslash does not work when specifying a file path in Python. You need to use a forward slash or add one more backslash as shown in the code below.
All the functions below will return a `dataframe`, an object in Python that stores data and allows access with a certain syntax that I often refer to as "dot notation".

In [None]:
# This code is used to navigate the file structure of main. You may or may not need to run this.
import os as os
cwdup = os.path.split(os.getcwd())
os.chdir(cwdup[0])
print(cwdup)

In [None]:
print(os.getcwd())

In [None]:
os.chdir(cwdup[0]+'\\'+cwdup[1])

In [None]:
print(os.getcwd())

In [None]:
txtfile = "data//globaltemps.txt"
csvfile = "data//sunspotsbyyear.csv"
xlsfile = "data//GlobalCarbonBudget2022.xlsx"

In [None]:
txtfile

# Functions for Reading Files
Basic Python has a number of native functions used to read files. 
## Native `Open`
The function `open` is the simplest and first function to use when opening plain text files. It takes 2 arguments: a string filename and a string option. There are a variety of options you could use, from [geeksforgeeks](https://www.geeksforgeeks.org/open-a-file-in-python/)


| Mode | Description |
|------|:-------------|
| 'r'  | Open text file for reading. Raises an I/O error if the file does not exist. |
| 'r+' | Open the file for reading and writing. Raises an I/O error if the file does not exist. |
| 'w'  | Open the file for writing. Truncates the file if it already exists. Creates a new file if it does not exist. |
| 'w+' | Open the file for reading and writing. Truncates the file if it already exists. Creates a new file if it does not exist. |
| 'a'  | Open the file for writing. The data being written will be inserted at the end of the file. Creates a new file if it does not exist. |
| 'a+' | Open the file for reading and writing. The data being written will be inserted at the end of the file. Creates a new file if it does not exist. |
| 'rb' | Open the file for reading in binary format. Raises an I/O error if the file does not exist. |
| 'rb+'| Open the file for reading and writing in binary format. Raises an I/O error if the file does not exist. |
| 'wb' | Open the file for writing in binary format. Truncates the file if it already exists. Creates a new file if it does not exist. |
| 'wb+'| Open the file for reading and writing in binary format. Truncates the file if it already exists. Creates a new file if it does not exist. |
| 'ab' | Open the file for appending in binary format. Inserts data at the end of the file. Creates a new file if it does not exist. |
| 'ab+'| Open the file for reading and appending in binary format. Inserts data at the end of the file. Creates a new file if it does not exist. |




In [None]:
#Creates a file handle to reference
txthandle = open(txtfile, "r")
type(txthandle)

In [None]:
#Get the whole file's content in one string
txthandle.read()

Be aware with `.read()` it displays the entire file, but then the *file position indicator* shifts to the end of the file. What happens if you try to `read` again?

In [None]:
#Get the first n characters
txthandle.read(19)

In [None]:
#Read line by line
print(txthandle.readline())
print(txthandle.readline())

## The methods `seek()` and `tell()`
The file handle created by `open` keeps track of where it is in the file. The methods `seek` and `tell` exist to let you control and query the location of the file handle within the file. 

In [None]:
#seek method allows you to move around the file
txthandle.seek(7)

In [None]:
txthandle.tell()

In [None]:
txthandle.read(6)

In [None]:
#Always close your file handle!
txthandle.close()

Great! Can the `open` function open more complicated files? Sure! But beware that even more special characters may appear.

In [None]:
csvhandle = open(csvfile,'r')

In [None]:
csvhandle.read()

In [None]:
csvhandle.read(81)

In [None]:
csvhandle.readline()
csvhandle.readline()


In [None]:
csvhandle.close()

## Numpy
The package `numpy` contains functions for reading files; two are `loadtxt` and `genfrom txt`. 

`loadtxt` reads the file data into a numpy array.

In [None]:
import numpy as np
temps = np.loadtxt(txtfile)
type(temps)

In [None]:
temps[1,1]

In [None]:
temps[1][1]

The function `genfromtxt` is more functional in that it can return a masked array and use filling values for **missing data**. To read a file, however, you must specify a `delimiter`, the character that separates values in the file.

In [None]:
temp = np.genfromtxt(txtfile,delimiter='\t')
temp

In [None]:
sunspots = np.genfromtxt(csvfile,delimiter=',')
sunspots

In [None]:
sunspots = np.genfromtxt(csvfile,delimiter=',', usemask = True, filling_values=np.nan)
sunspots

## Common Way with no 'close' necessary
Using the `with` command with `open` allows you a quick and concise way of reading a file without having to close your file handle. Below is an example of the syntax.

In [None]:
with open(txtfile) as glob:
    #print(glob.read())
    temps = glob.read()

In [None]:
temps

In [None]:
#What happens with the with open formulation??
glob.readline()

## Pandas (A first look)
When loading in the package `pandas`, we can use the `read_table()` function to pull data from text file. You could also use the `read_csv()` with `sep= "\t"` to read data from tab-separated file or with `sep=\s+` for space separated values. By default, python will look for a header row unless otherwise specified. `pandas` imports data from the files as `dataframes`. All the functions below will return a `dataframe`, an object in Python that stores data and allows access with a certain syntax that I often refer to as "dot notation".

In [None]:
import pandas as pd
txt1 = pd.read_table(txtfile, header=None,names=['year','temp'])
txt2 = pd.read_csv(txtfile,header=None,sep='\s+')
csv1 = pd.read_csv(csvfile,header=None)
csv2 = pd.read_csv(csvfile,header=None,names = ['year', 'numspots', 'stdev','Nobs','confirmed'])

In [None]:
txt1

In [None]:
txt2

In [None]:
csv2

In [None]:
type(csv2)

You can even add column names while loading the file:

If you look at csvfile, you'll see some columns with `-1` as a value; this indicates *missing data*. Classifying your missing data properly will help you avoid accidential using the value in a calculation. You can specify this with another option:

In [None]:
csv3 = pd.read_csv(csvfile,header=None,names = ['year', 'numspots', 'stdev','Nobs','confirmed'],na_values=['-1'])
#csv3.stdev
print(csv3)
csv3['stdev']

### Reading Excel files with Pandas
Python will read excel files in the same manner. You can specify sheets and column/row in which to import. 

In [None]:
xcel1 = pd.read_excel(xlsfile,sheet_name="Global Carbon Budget", skiprows=20,header=0)

In [None]:
print(xcel1)
print(type(xcel1))
xcel1['Year'][2]

# Exercise: Reading Files Practice

**Part A: Multiple Reading Methods**
1. Read `globaltemps.txt` using the native `open()` function and print the first line.
2. Read the same file using `pandas` with column names ['year', 'temp'].
3. Read it using `np.genfromtxt()` with the correct delimiter.
4. Compare: What data types do you get from pandas vs numpy?.

**Part B: Data Access**
1. Using your pandas dataframe, find the temperature for the year 1900.
2. Print the rows of temperature data from 1900 to 1920.

## Writing to a Plain Text file
Python contains builtin functions to output information to a plain text file. It involves opening a file to write to, writing to that file, and then closing said file. 

In [None]:
txt1

In [None]:
#txt1 = txt1.to_string()
f= open("test1.txt","w+")
f.write(temps)
f.write('\n')
f.close()

You may also append a file (add to it without overwriting).

In [None]:
f= open("test1.txt","a+")
f.write(temps)
f.write('\n')
f.close()

## Writing to a CSV file
Python does have allow you to read to a `csv` file. The details of which are a bit uninformative at this point. But if that is something you want to do, please look into the package `csv`. 

## Writing to an Excel file
`pandas` contains functionality that allows you to write data to a Microsoft Excel file. The method `to_excel` allows you to write data to sheets within the excel spreadsheet, but you will need to wrap the function `ExcelWriter` around `to_excel` in order to write to multiple sheets within the same file. The 'ExcelWriter' function does require more syntax as shown below:  

In [None]:
#must specify a file, you can specify a sheet
csv2.to_excel("sunspotsout.xlsx",sheet_name='sun')
xcel1.to_excel("sunspotsout.xlsx",sheet_name='carbon accounting')

In [None]:
with pd.ExcelWriter('pandas_simple.xlsx') as writer:
    csv2.to_excel(writer, sheet_name='sun')
    xcel1.to_excel(writer, sheet_name='carbon')

# Exercise: Writing Files Practice

**Part A: Export Your Temperature Analysis**
1. Using the temperature dataframe from Exercise 1, create a smaller dataset with only years 1900-1920
2. Write this subset to a text file called `early_1900s_temps.txt`

**Part B: Excel Export**
1. Take the same 1900-1920 temperature data
1. Export it to an Excel file called `temp_subset.xlsx` with sheet name 'early_temps'

**Part C: Verify Your Work**
1. Read your `temp_subset.xlsx` file back into a new dataframe to confirm it worked
1. Print the shape of this new dataframe