# Lesson 07: File types, Parsing Data, Plotting

Text files come in many different formats. Some of them are well-known and ubiquitous, others are custom and only associated with certain programs. 

All of them can be read by Python. Transforming the raw text in the files into meaningful information is a process known as _parsing_. 

## Common file types

These common file types all have associated suffixes, e.g. `.csv`, `.json`, `.xml`, but the file itself does not need to have a matching suffix. For example, a file can end with `.xyz` but still be a `csv` or `json` file.

### 1. `csv` - Comma separated values

* Represents _tabular_ data
* Fields in the table (columns) are separated by commas
* Records in the table (rows) are separated by new line characters `\n`

Example:

```
Beam_ID, Length, E, I
0, 130, 2.2e6, 495
1, 83.5, 1.55e6, 320
```

#### Parsing csv data

Python provides the `csv` module.

Using the `csv` module breaks the each line in the file into a list. Each item in the list is one "cell" in the row.

```python
import csv

file_name = "my_file.csv"
file_data = [] #acc
with open(file_name, 'r', new_line = '') as file:
    reader = csv.reader(file)
    for row in reader:
        file_data.append(row)
```

### 2. `json` - Javascript Object Notation

* Represents _heirarchical_ data
    * Some heirachical data can be represented in a table
    * However, heirachical data is good for storing data that cannot be represented in a table
* The format is very much like a Python dictionary in a text file
* Dictionaries can represent infinitely nested data of the common Python data types:
    * `float`
    * `int`
    * `str`
    * `list`
    * `dict`
* Output files from ForteWeb are examples of JSON data

Example:
```json
{
    "beam_0": {
        "spans": [1200, 2400],
        "E": 2.2e6,
        "I": 495,
    },
    "beam_1": {
        "spans": [23, 120, 60],
        "E": 1.55e6,
        "I": 320,
    },
}
```

Notice the difference between the JSON representation of the beam and the CSV representation. The CSV representation cannot hold a representation of beams with multiple spans easily but the JSON representation can accommodate a list of spans as one of its "cells".

#### Parsing json data

Python provides the `json` module.

Using the `json` module reads the data into a Python dictionary.

```python
import json

file_name = "my_file.json"
file_data = json.load(file_name)    
```

### 3. `xml` - Extensible Markup Language

* Represents complicated heirarchical data but, like JSON, can also be used to represent tabular data
* Richer and more nuanced than JSON
* Can be _coerced_ into a Python dictionary but with some compromise
* Represented in Python by an XML ElementTree
* Decon files are an example of an XML file

Example:

```xml
<beam>
    <beam_ID>beam_0</beam_ID>
    <beam_spans>
        <beam_span>1200</beam_span>
        <beam_span>2400</beam_span>
    </beam_spans>
    <E>2200000</E>
    <I>495</I>
</beam>
<beam>
    <beam_ID>beam_1</beam_ID>
    <beam_spans>
        <beam_span>23</beam_span>
        <beam_span>120</beam_span>
        <beam_span>60</beam_span>
    </beam_spans>
    <E>1550000</E>
    <I>320</I>
</beam>
```

#### Parsing XML data

Python provides the `xml` module

```python
from xml.etree.ElementTree import ElementTree

file_name = "my_file.xml"
xml_data = ElementTree()
xml_data = xml_data.parse(file_name)
```

Navigating an XML element tree requires some more knowledge of the XML file format (which can get quite complicated). 

For more information, this tutorial is a good introduction to the XML file format and working with XML data in Python: https://www.datacamp.com/community/tutorials/python-xml-elementtree

## Application file formats and others

Often, text file formats created by software applications involve elements of other file formats but cobbled together in their own way.

### Parsing application custom file formats

Parsing custom file formats can be done by creating new data definitions along with a series of functions to process the raw data. In other words, you write a small program.

Parsing a custom file format can be simple if you are just looking to read and/or write specific portions of the file. It can become a lot more complicated if you seek to fully parse the complete format.

### spColumn `.cti` files

spColumn has two file formats: `.col` and `.cti`

* `.col` is a _binary_ file format and is not readily readable without knowning the binary encoding
* `.cti` is a _text_ file format and can be read and modified by any text editor

spColumn documents how the `.cti` file format work on their site: https://structurepoint.org/spColumn-Online-Manual/spColumn/Appendix/spColumn_Text_Input_CTI_file_format.htm

Generally, a `.cti` file is a series of `csv` tables separated by table headings in `[...]`

Here is a portion of what a `.cti` file looks like:

```
[Reduction Factors]
0.800000,0.850000,0.650000,0.100000,0.000000
[Design Criteria]
0.010000,0.080000,30.480001,1.000000
[External Points]
5
-304.799988,-457.199982
304.799988,-457.199982
304.799988,457.199982
-304.799988,457.199982
-304.799988,-457.199982
[Internal Points]
0
[Reinforcement Bars]
14
500,-237.549988,389.949982
500,-0.000015,389.949982
500,237.549957,389.949982
500,-237.549988,-389.949982
```

# 2. Navigate your computer's file system with `pathlib`

Using the pathlib module from the Python standard (built-in) library, you can easily navigate and manipulate files on your computer

```python
import pathlib # Loads the pathlib module into memory
```

The primary object within the pathlib module is `Path`. You create a new `Path` object by calling it like a function.

```python
my_path = pathlib.Path() # Relative path to cwd (wherever your current notebook is)
```

This creates a _relative_ Path object that represents your current working directory. No matter where you are in your computer, your relative path will be represented by `.`. If you were to navigate a directory down, your path would become `./new_directory/`.

If you would like to make your path object an _absolute_ path, you do so with the `.resolve()` method:

```python
my_path_abs = my_path.resolve()
my_path_abs
```

## Things you can do with a `Path` object

### Navigate to a new directory

```python
my_new_path = my_path_abs / "New_directory_name" # Use the divide operator, neat huh?
```

### Make a new directory

```python
my_new_path = my_path_abs / "New_directory_name" # This directory does not exist yet
my_new_path.mkdir() # Now it does
```

### Make a path to a file

```python
# This file path does not exist yet but that's ok
my_file = my_path_abs / "New_directory_name" / "My_excel_file.xlsx"
```

### Check to see if a path exists
```python
my_new_path.exists()
my_file.exists()
```

### Create a new empty file
```python
my_file.touch()
my_file.exists()
```

### Rename a file
```python
my_file.rename(my_file.parent / "new_file_name.txt") # Use absolute path, can be used to move files too
```

### And more...

In **Edit mode**, type `my_new_path.` then hit `[TAB]` and wait for Jupyter to show you the list of all the other methods and attributes of your path!

We will come back to paths at the end of this lesson...