# Chapter 18: import, read/write, .txt and .csv

In this brief chapter, we'll discuss the bare-bones basics of Python import statements, as well as the basic syntax for reading and writing files. We'll also look at opening / writing plain text files and Comma Separated Value (CSV) files. 

**Questions? Drop em in the Slack!**

## import

For the fullest explanation of import statements, as well as syntax not covered in this overview, **see the [Python documentation](https://docs.python.org/3/reference/import.html)**

One of the things that makes Python such an attractive option for scientific research ("scientific" including the human sciences!) is its rich open source library of packages that can be imported. We've already worked with `Pandas`, which is one example of such a package. A package is simply a set of Python files, or [modules](https://docs.python.org/3/tutorial/modules.html), that together provide functions or objects that can be used when imported into a Python script. The files contain Python functions or [classes](https://docs.python.org/3/tutorial/classes.html) and can import or call each other. When we import a module, though, we don't need to see any of this. 

Some packages come standard with a Python installation. For instance, one of the most useful packages is the `collections` packages. The standard import statement is as follows:

In [1]:
import collections

Note that collections itself is a kind of object, a `module`.

In [2]:
type(collections)

module

Like other objects, modules have methods (i.e. functions) that we can call:

In [3]:
dir(collections)

['ChainMap',
 'Counter',
 'OrderedDict',
 'UserDict',
 'UserList',
 'UserString',
 '_Link',
 '_OrderedDictItemsView',
 '_OrderedDictKeysView',
 '_OrderedDictValuesView',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__getattr__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_chain',
 '_collections_abc',
 '_count_elements',
 '_eq',
 '_heapq',
 '_iskeyword',
 '_itemgetter',
 '_proxy',
 '_recursive_repr',
 '_repeat',
 '_starmap',
 '_sys',
 '_tuplegetter',
 'abc',
 'defaultdict',
 'deque',
 'namedtuple']

We can also assign a short-hand name to packages when we import them, with the `as` statement:

In [4]:
import numpy as np

Now the package `numpy` is stored under a variable, `np`. 

In [6]:
type(np)

module

Just as with other variables assigned to objects with methods/functions, we can do things with `np`:

In [7]:
np.arange(1,10)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

### There are other details about import statements not dealt with here that you can find out more about in the [Python documentation](https://docs.python.org/3/reference/import.html).

## `pip` installing packages

The packages we've been importing here have probably already been installed by your Anaconda setup. But some packages you probably do not have. In Python, packages are stored in the [Python Package Index](https://pypi.org), which can be accessed by doing a "pip install". 

To do this, we need to supply a shell command to the `terminal` of our computer. On a Mac, you can do this by launching the `terminal` app and writing the command directly; on a PC, you can launch it in the `conda shell` via Anaconda.

One convient way to run shell commands is from Jupyter notebooks. By adding an exclamation mark to the front of a cell, we can run such commands. For instance:

In [8]:
! date

Fri 24 Apr 2020 14:33:39 BST


The command `date` is executed as a shell command. We can do something similar when install python packages. The syntax for installing modules is, with an Anaconda installation: 

`pip install [module name]`

That command needs to be executed as a shell command, so we do so by prepending the necessary exclamation mark. Below we'll install a really useful corpus analysis package called [Text-Fabric](https://annotation.github.io/text-fabric/), which we will use later on.

In [11]:
! pip install text-fabric



### After this runs, you need to restart the kernel of the notebook to make the installation importable

Note that if you do not have an Anaconda installation of Python active, the syntax may be different. Some people may need to say `pip3 install`. See more at:
### [PIP Installing tutorials](https://packaging.python.org/tutorials/installing-packages/)



## Read/Write Files

### [Have a look at the Python documentation on read/write](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files)

Let's look at how to open basic plain text files. We will open the readme for the course. We will need to think about the file path to do that. A file path points to a location on your machine where a file is. A relative import allows us to import relative to this notebook's current position. 

By default, two dots indicates to move up one folder or directory. 

###  [Please read about the syntax of file paths](https://automatetheboringstuff.com/chapter8/)

In [12]:
readme_file = '../README.md'

Now we can open the file. To do so, I will show you the simplest, but not only, method.

In [13]:
with open(readme_file, 'r', encoding='utf8') as infile:
    readme = infile.read()

We have loaded the readme file for the course. Note the important function `open`, which takes obligatory argument, a file path. We also supply it a second argument `'r'`. This stands for "read". 

**Important: for reading files, suply `'r'`, for writing files supply `'w'`**.

We also supply an optional argument, `encoding` to make sure that our computer imports the file with [`utf8` encoding](https://www.w3schools.com/charsets/ref_html_utf8.asp).

When we open the file, we assign it a variable, `infile`. We read in that text of that file as a string with `infile.read()`.

We can see that the result is a string:

In [16]:
type(readme)

str

And we can treat it like any other string:

In [18]:
print(readme[:1000])



# Python for Linguists and Humanists 

**Cody Kingham, `cak47[put "at-sign" here].cam.ac.uk`**

**NOTE: this course will be updated regularly throughout the next few weeks. -12/04/2020**

Much of this course material is directly adapted from the [Python for Text Analysis Course](https://github.com/cltl/python-for-text-analysis) at the Vrije Universiteit Amsterdam. I take care to indicate those materials which are directly copied from that course. A special thanks to [Chantal van Son](https://github.com/ChantalvanSon), [Evan Miltenburg](https://github.com/evanmiltenburg), [Marten Postma](https://github.com/MartenPostma), [Filip Ilievski](https://github.com/filievski), Pia Sommerauer, and the [Computational Lexicology & Terminology Lab](http://www.cltl.nl) at the VU.

Pandas chapters are from [Joris Van den Bossche's Pandas Tutorial](https://github.com/jorisvandenbossche/pandas-tutorial)

## Contents

* [Intro](#Intro) - description of the course 
* [Course Schedule](#Course-Schedule) 

We can write a file in a similar way. Let's create a blank file and save it in this same folder. We're going to put the contents of the following string in the file:

In [22]:
shakespeare = """

SCENE I. Rousillon. The COUNT's palace.

Enter BERTRAM, the COUNTESS of Rousillon, HELENA, and LAFEU, all in black
COUNTESS
In delivering my son from me, I bury a second husband.
BERTRAM
And I in going, madam, weep o'er my father's death
anew: but I must attend his majesty's command, to
whom I am now in ward, evermore in subjection.
LAFEU
You shall find of the king a husband, madam; you,
sir, a father: he that so generally is at all times
good must of necessity hold his virtue to you; whose
worthiness would stir it up where it wanted rather
than lack it where there is such abundance.
COUNTESS
What hope is there of his majesty's amendment?
LAFEU
He hath abandoned his physicians, madam; under whose
practises he hath persecuted time with hope, and
finds no other advantage in the process but only the
losing of hope by time.
COUNTESS
This young gentlewoman had a father,--O, that
'had'! how sad a passage 'tis!--whose skill was
almost as great as his honesty; had it stretched so
far, would have made nature immortal, and death
should have play for lack of work. Would, for the
king's sake, he were living! I think it would be
the death of the king's disease.

"""

This is the string we will write. Here is how we will write it:

In [23]:
new_file = 'shakespeare.txt'

with open(new_file, 'w', encoding='utf8') as outfile:
    outfile.write(shakespeare)

The new file has now been placed in the same folder as this notebook. 

If it worked, you should be able to click on this link and open it in the browser:

[shakespeare.txt](shakespeare.txt)

### Importing CSV files

CSV is a very useful kind of file which contains tabular data. CSV's can be read and written by most popular spreadsheet software programs like Excel or Google Spreadsheets. We can use the `csv` module to access csv files. 

In [24]:
import csv

There are several ways to import / write csv files with the `csv` module. So

### [please refer to the csv module documentation](https://docs.python.org/3/library/csv.html)

We have a .csv for the course already stored in data. The file path is the following:

In [25]:
titanic = '../data/titanic.csv'

The way to open this csv file is very similar with opening text files. Except now we will make use of the csv module.

In [27]:
with open(titanic, 'r', encoding='utf8') as infile:
    reader = csv.reader(infile)
    titanic_data = list(reader)

We have imported all of the csv rows as a list. Let's peak at the list.

In [32]:
titanic_data[:2]

[['PassengerId',
  'Survived',
  'Pclass',
  'Name',
  'Sex',
  'Age',
  'SibSp',
  'Parch',
  'Ticket',
  'Fare',
  'Cabin',
  'Embarked'],
 ['1',
  '0',
  '3',
  'Braund, Mr. Owen Harris',
  'male',
  '22',
  '1',
  '0',
  'A/5 21171',
  '7.25',
  '',
  'S']]

We can write csv files using similar syntax. Here is an example where we have a list of lists, and each list represents a row in the dataset we want to export.

In [33]:
rows = [
    [1, 'some data1', 'True'],
    [2, 'some data2', 'False'],
    [3, 'some data3', 'True'],
    [4, 'some data4', 'False'],
    [5, 'some data5', 'True'],
    [6, 'some data6', 'True']
]

with open('some_data.csv', 'w', encoding='utf8') as outfile:
    writer = csv.writer(outfile)
    writer.writerows(rows)

There are other ways to also add header information. To do that...

### [Please read over the .csv documentation](https://docs.python.org/3/library/csv.html)