# Working with other file formats: csv and json

## Outline:
- what is a file format?
- xlsx
- csv
- json

## 1. What is a file format?
- A file format is a standard way in which data is ENCODED (binary or ASCII) and ORGANIZED for storage.
- To identify the format of a file, we just need to look at the extension (the part behind the '.' of a file)
- eg: 'test.csv' is a csv file
- Some common file formats are:
    - xlsx: excel file
    - csv: comma-seperated-value file
    - json: JavaScript Object Notation

## 2. xlsx file:
- xlsx is a Microsoft Excel Open XML file format. It is another type of spreadsheet format (where all data points represented in rows and columns of a table).
```python
import pandas as pd #use pandas to work with xlsx

file_dir = '' #insert the file directory

df = pd.read_excel(file_dir)
#instead of file_dir, url of the source can be passed into the ()

```



### 3. csv file:
- comma-seperated values file format.
- can take the form of a spreadsheet file format.
```python
import pandas as pd #using pandas to work with the csv file

file_dir = '' #insert the file directory

df = pd.read_csv(file_dir, 
                 header = None) #replace None with the name of the df

#Adding column names to the df:
df.columns = ['col1', 'col2', '...']

                 
```                 

## 4. json file:
- JavaScript Object Notation is a lightweight data-interchange format. 
- A json object can take the form of a nested dictionary.

In [1]:
import json

#create a dict:
pet = {
    'name': 'cutie',
    'age': '6 months',
    'color': 'gold',
    'breed': 'British',
    'family':{
        'dad': 'bob',
        'mom': 'bin',
        'sister': 'nii',
        'brother': 'doo'
    }
}

type(pet)

dict

### 4a. Serialization:
- the process in which data is turned into the an apt file format for storage and sharing


#### `Using json.dump()` to write a dict into a file:

In [2]:
with open ('pet.json', 'w') as file: #create a file and open it
    json.dump(pet, #the name of the dict
              file #pointer of the file being opened
             )

#### `using json.dumps()`: turn a dict into a string, then write it into a file:

In [5]:
#turn a dict into a string:
pet_str = json.dumps(pet,
                     indent = 6 #specified the number for each line
                    )

print(pet_str)
    

{
      "name": "cutie",
      "age": "6 months",
      "color": "gold",
      "breed": "British",
      "family": {
            "dad": "bob",
            "mom": "bin",
            "sister": "nii",
            "brother": "doo"
      }
}


In [6]:
#create and open a new file:
with open('pet1.json', 'w') as file:
    file.write(pet_str)

#### `json.loads()` to turn a string into a dict:

In [8]:
#str:
string = '''{
      "name": "cutie",
      "age": "6 months",
      "color": "gold",
      "breed": "British",
      "family": {
            "dad": "bob",
            "mom": "bin",
            "sister": "nii",
            "brother": "doo"
      }
}
'''
print(string)
print(type(string))

{
      "name": "cutie",
      "age": "6 months",
      "color": "gold",
      "breed": "British",
      "family": {
            "dad": "bob",
            "mom": "bin",
            "sister": "nii",
            "brother": "doo"
      }
}

<class 'str'>


In [9]:
dict1 = json.loads(string)
print(dict1)
print(type(dict1))

{'name': 'cutie', 'age': '6 months', 'color': 'gold', 'breed': 'British', 'family': {'dad': 'bob', 'mom': 'bin', 'sister': 'nii', 'brother': 'doo'}}
<class 'dict'>


In [10]:
with open('pet2.json', 'w') as file:
    json.dump(dict1, file)

### 4b.Deserialization:
- The process of turning data from a file into an apt data structure or type to work with

#### `Using json.load()` to extract data from a json file:

In [11]:
with open('pet2.json', 'r') as file:
    json_object = json.load(file)
    
print(json_object)
print(type(json_object))

{'name': 'cutie', 'age': '6 months', 'color': 'gold', 'breed': 'British', 'family': {'dad': 'bob', 'mom': 'bin', 'sister': 'nii', 'brother': 'doo'}}
<class 'dict'>
