# Learning objectives

1. Begin to understand the working directory
2. Import .txt and .csv files

# Importing data from files

No set of basic skills is complete without learning how to import data from files. Remember to restart your kernel if your file paths get messed up!

### Getting your bearings

First, import pandas to simplify .csv imports (importing .txt files is already built-in to Python). Then, use `!pwd` to check the location of your "working directory" (the folder on your computer that Python is connected to). 

In [1]:
import pandas as pd

In [2]:
# "print" working directory
!pwd

/Users/tomvannuenen/Documents/GitHub/DIGHUM101-2022/Notebooks/Week1


We actually want the "Data/" folder inside of the main "DIGHUM101-2022" directory, so we can change this working directory path with the ["os" module](https://docs.python.org/3/library/os.html) to interact with your computer's operating system. 

In [3]:
# Import the os module
import os

# Another way to check the cwd ("current" working directory) using the os module
os.getcwd()

'/Users/tomvannuenen/Documents/GitHub/DIGHUM101-2022/Notebooks/Week1'

![path](../../Img/path.png)

Argh! Our default working directory is wherever we launched the notebook - in our case the "Week1" folder. We want to access the "Data" folder, which is two levels "up", inside of the main "DIGHUM101-2022" directory. 

We can edit the path Now we can change the file path to the correct directory. 

- We could type `os.chdir("../")` to go up one level into the "Notebooks" directory.  
- Or, we could type `os.chdir("../../")` to go up **two** levels into the "DIGHUM101-2022" directory
- Or, we could move into the "Data" directory in one line by typing `os.chdir("../../Data")`

> NOTE: You will more about navigating file paths in week 4.

In [4]:
# We include two ../ because we want to go two levels up in the file structure
os.chdir("../../Data")

In [5]:
%pwd

'/Users/tomvannuenen/Documents/GitHub/DIGHUM101-2022/Data'

In [6]:
os.getcwd()

'/Users/tomvannuenen/Documents/GitHub/DIGHUM101-2022/Data'

Now we can use `ls` to list the files in that directory

In [7]:
!ls

[34mGeo[m[m                          feminism.json
childrens_lit.csv            feminism.xml
compound_figure.pdf          frankenstein.txt
correspondence-data-1585.csv gapminder-FiveYearData.csv
dracula.txt                  [34mhuman-rights[m[m
example.json                 iris.csv
example.xml                  music_reviews.csv


# Import .txt file

Now that Python is looking in the correct location, we can pass in a single argument to the `open()` function - the name of the file! The `.read()` method from open states that the end of the file has been identified. 

In [8]:
frank = open("frankenstein.txt")

In [9]:
frank = open("frankenstein.txt").read()

# If your characters don't look quite right, try adding the argument encoding = "utf-8")
# frank = open("frankenstein.txt", encoding = "utf-8").read()
print(frank)

﻿
Project Gutenberg's Frankenstein, by Mary Wollstonecraft (Godwin) Shelley

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.net


Title: Frankenstein
       or The Modern Prometheus

Author: Mary Wollstonecraft (Godwin) Shelley

Release Date: June 17, 2008 [EBook #84]
Last updated: January 13, 2018

Language: English

Character set encoding: UTF-8

*** START OF THIS PROJECT GUTENBERG EBOOK FRANKENSTEIN ***




Produced by Judith Boss, Christy Phillips, Lynn Hanninen,
and David Meltzer. HTML version by Al Haines.
Further corrections by Menno de Leeuw.



Frankenstein;


or, the Modern Prometheus




by


Mary Wollstonecraft (Godwin) Shelley






CONTENTS




Letter 1

Letter 2

Letter 3

Letter 4

Chapter 1

Chapter 2

Chapter 3

Chapter 4

Chapter 5

Chapter 6

Chapter 7

Chapter 8

Chapter

In [10]:
# What class is frankenstein? 
type(frank)

str

In [11]:
# How many times does the word "London" appear in frankenstein.txt?
frank.count("London")

8

# Import a .csv file

Comma separated values files are common because they are relatively small and look good in spreadsheet software. A comma separated values file is just a text file that contains data but that has commas (or other separators) to designate column breaks.

> NOTE: You will learn more about Pandas DataFrames in Week 2!

### Gapminder dataset

What is the [gapminder-FiveYearData](https://en.wikipedia.org/wiki/Gapminder_Foundation) dataset about?

In [12]:
gap = pd.read_csv("gapminder-FiveYearData.csv")
gap.head()

Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap
0,Afghanistan,1952,8425333.0,Asia,28.801,779.445314
1,Afghanistan,1957,9240934.0,Asia,30.332,820.85303
2,Afghanistan,1962,10267083.0,Asia,31.997,853.10071
3,Afghanistan,1967,11537966.0,Asia,34.02,836.197138
4,Afghanistan,1972,13079460.0,Asia,36.088,739.981106


### Music reviews dataset

What about the music reviews dataset?

This dataset is separated by tab breaks instead of commas. However, tab separated files can be stored in a .csv file just the same - we just need to add the `"\t"` argument to the `sep = ` parameter.

In [13]:
music = pd.read_csv("music_reviews.csv", sep = "\t")
music.head()

Unnamed: 0,album,artist,genre,release_date,critic,score,body
0,Don't Panic,All Time Low,Pop/Rock,2012-10-09 00:00:00,Kerrang!,74.0,While For Baltimore proves they can still writ...
1,Fear and Saturday Night,Ryan Bingham,Country,2015-01-20 00:00:00,Uncut,70.0,There's nothing fake about the purgatorial nar...
2,The Way I'm Livin',Lee Ann Womack,Country,2014-09-23 00:00:00,Q Magazine,84.0,All life's disastrous lows are here on a caree...
3,Doris,Earl Sweatshirt,Rap,2013-08-20 00:00:00,Pitchfork,82.0,"With Doris, Odd Future’s Odysseus is finally b..."
4,Giraffe,Echoboy,Rock,2003-02-25 00:00:00,AllMusic,71.0,Though Giraffe is definitely Echoboy's most im...


Save your changes and open "1-8_errors-help.ipynb" to learn about error messages and finding help.