# Manipulating Directories



File Manipulation
<br>In this lab, we are going to use python to view, write, and read files and explain different common file types used to store data.
<br><br>Lets first access and view directories in python.

In [None]:
#Here are some Unix commands. We can use them in a terminal, but can also execute them in jupyter notebooks. However, there outputs cannot be read, so this is mostly a convience tool
# and should not be used in actual coding.
!ls
!cd 
!ls sample_data

sample_data
anscombe.json		      mnist_test.csv
california_housing_test.csv   mnist_train_small.csv
california_housing_train.csv  README.md


In [None]:
import os


#Lets first see our current working directory. This is the directory python is looking for files. This is similar to the unix command "cd"
print(os.getcwd())

#Lets see what files and directories we got. This is similar to the unix command "ls", however notice it displays everything, even the hidden .config file
print(os.listdir())

#Lets change the directory to be inside of the directory sample data.
os.chdir('./sample_data')

#lets now see the new directory
print("The new directory is: " + os.getcwd())

#Lets see what files we have inside here.
print("The new file list is  " + str(os.listdir()))

#Lets go back to our original directory
os.chdir('/content')

/content
['.config', 'sample_data']
The new directory is: /content/sample_data
The new file list is  ['README.md', 'anscombe.json', 'california_housing_train.csv', 'california_housing_test.csv', 'mnist_test.csv', 'mnist_train_small.csv']


In [None]:
#Just like you can do on your computer normally, python can read and write text based files.

#Lets write a love letter. The \n is an escaper character that indicates a 'new line' instead of pressing enter as you would normally do.
#Escapers are characters that are the string representations of characters that would otherwise disrupt code
my_string = 'i love u \nhaha jk \nunless???'

#This container will open up the file, run the code inside, then close it all very neatly.
with open('loveletter.txt', 'w') as f:
  f.write(my_string)

#Lets also find where its located

print(str(os.listdir()))
print(os.getcwd())


['.config', 'loveletter.txt', 'sample_data']
/content


In [None]:
#We have written the file. Lets open it up but instead of 'w' for write, we will do 'r' for read. Lets print what it says.
#You may open up the text file and write anything you want and run the code again.

with open('loveletter.txt', 'r') as f:
  my_letter = f.read()
print(my_letter)


i love u 
haha jk 
unless???


In [None]:
#For fun, lets write a function that will do this n times where n.
def write_letters(name : str, text : str , count: int):
  for x in range(count):
    with open(f'{name}_{x}.txt', 'w') as f:
      f.write(text)

#Lets just write some letters 1000 times
write_letters('i love u', 'i love you itachi uchiha', 1000)

#As you can see it did 1000 task in way less than 0.1 seconds, although this task was pretty simple.

#Concept of Map objects
<br>Maps are a series of key value pairs. Typically keys are strings, and values could be anything. In python, map objects are called dictionaries, but in other languages, maps get complicated in terms of datatypes it can store

In [None]:
#Here is an example dictionary that represents a student:
#Notice that the 'keys' are strings, and the 'values' could be anything, even other dictionaries
my_dict = {'name' : 'Ethan',
           'major': 'Nanoengineering',
           'year': 3,
           'courses': ['nano4','nano11','nano181','nano102'],
           'another dictionary' : {'a' : 'b', 'c' : 'd' }
           }
#To call something, use [] and the string of the key.

print(my_dict['name']) 
#For a dictionary in a value you can call it using another []

print(my_dict['another dictionary']['a'])

#get the list of keys

print(my_dict.keys())

#get list of values
print(my_dict.values())

#An important format of data is found as a list of dictionaries with identical values. You will see why this is later.

Ethan
b
dict_keys(['name', 'major', 'year', 'courses', 'another dictionary'])
dict_values(['Ethan', 'Nanoengineering', 3, ['nano4', 'nano11', 'nano181', 'nano102'], {'a': 'b', 'c': 'd'}])


#JavaScript Object Notation (JSON)
<br>JSON is a very popular file format used to typically store lists or maps.

In [None]:
import json
import pprint
#Conviently, all python dictionaries and lists can be converted to json files.

my_list = [1,2,3,4,5]

#lets write some json files.
json_object = json.dumps(my_list)
 

with open("my_list.json", "w") as outfile:
    outfile.write(json_object)

#again with a dictionary
json_object2 = json.dumps(my_dict)
 

with open("my_dict.json", "w") as outfile:
    outfile.write(json_object2)

#Lets read those files
with open("my_list.json", "r") as f:
    my_list_read = json.load(f)
print(my_list_read)



[1, 2, 3, 4, 5]


In [None]:
#Activity: read the json dictionary we just made.

# Comma Separated Values (.csv)
<br>CSV files are simply text files with objects separated by commas and hidden new line chars. The amount of objects separated by commas is the same for every line.<br>
They are very useful for representing rows and columns, just like an excel spread sheet. In fact, this is a very good way to export pandas dataframes to be read by excel or google sheets, or vice versa.

In [None]:
import pandas as pd
#Get some list of dictionaries 
list_of_dicts = [{'compound': 'Si34', 'auid': 'aflow:336d342eccc91ed6', 'aurl': 'aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/FCC/Si136_ICSD_56721', 'spacegroup_relax': 227, 'Pearson_symbol_relax': 'cF136', 'Egap': 0.5324, 'catalog': 'ICSD', 'energy_atom': -5.08412, 'prototype': 'Si136_ICSD_56721', 'enthalpy_formation_atom': 0.339606, 'ael_bulk_modulus_reuss': 61.4778}, {'compound': 'K8Te12', 'auid': 'aflow:cf40cb0bd363aec2', 'aurl': 'aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/ORC/K2Te3_ICSD_2453', 'spacegroup_relax': 62, 'Pearson_symbol_relax': 'oP20', 'Egap': 0.5325, 'catalog': 'ICSD', 'energy_atom': -3.04685, 'prototype': 'K2Te3_ICSD_2453', 'enthalpy_formation_atom': -0.723416, 'ael_bulk_modulus_reuss': 14.6771}]

#Convert to df
my_df = pd.DataFrame(list_of_dicts)
display(my_df)

#Export as csv

my_df.to_csv('my_csv.csv')



Unnamed: 0,compound,auid,aurl,spacegroup_relax,Pearson_symbol_relax,Egap,catalog,energy_atom,prototype,enthalpy_formation_atom,ael_bulk_modulus_reuss
0,Si34,aflow:336d342eccc91ed6,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/FCC/Si136...,227,cF136,0.5324,ICSD,-5.08412,Si136_ICSD_56721,0.339606,61.4778
1,K8Te12,aflow:cf40cb0bd363aec2,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/ORC/K2Te3...,62,oP20,0.5325,ICSD,-3.04685,K2Te3_ICSD_2453,-0.723416,14.6771


In [None]:
#Reading CSV, CSV --> DF

my_df2 = pd.read_csv('my_csv.csv', index_col='Unnamed: 0')
display(my_df2)


Unnamed: 0,compound,auid,aurl,spacegroup_relax,Pearson_symbol_relax,Egap,catalog,energy_atom,prototype,enthalpy_formation_atom,ael_bulk_modulus_reuss
0,Si34,aflow:336d342eccc91ed6,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/FCC/Si136...,227,cF136,0.5324,ICSD,-5.08412,Si136_ICSD_56721,0.339606,61.4778
1,K8Te12,aflow:cf40cb0bd363aec2,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/ORC/K2Te3...,62,oP20,0.5325,ICSD,-3.04685,K2Te3_ICSD_2453,-0.723416,14.6771


#Running Powershell Commands
<br>Python is able to run powershell commands using the built-in
subprocess library

This is useful for integrating python and any program that utilizes command line interfaces (particularly programs ran by unix commands)
<br>In google collab, there is none of that, but in later projects, it may be useful for automating tasks involving CLIs or unix commands that require custom inputs


In [None]:
#Let us first use an outdated approach. Note that this is an improper way of doing things.
os.popen("ls")

#Great. This ran a command that did something, but python does not read its output, it just does the command.

<os._wrap_close at 0x7f53c2751a00>

In [None]:
import subprocess
command = 'ls' #Recall from earlier, 'ls' is the unix command to list all files. This is just a placeholder.

result = subprocess.run(command,shell=True, capture_output=True)
#The output is encoded into byte string, a different datatype which is not the same as a regular string
#So we decode it into UTF-8, a more common string datatype.
output = result.stdout.decode('UTF-8')
print(output)

i love u_0.txt
i love u_100.txt
i love u_101.txt
i love u_102.txt
i love u_103.txt
i love u_104.txt
i love u_105.txt
i love u_106.txt
i love u_107.txt
i love u_108.txt
i love u_109.txt
i love u_10.txt
i love u_110.txt
i love u_111.txt
i love u_112.txt
i love u_113.txt
i love u_114.txt
i love u_115.txt
i love u_116.txt
i love u_117.txt
i love u_118.txt
i love u_119.txt
i love u_11.txt
i love u_120.txt
i love u_121.txt
i love u_122.txt
i love u_123.txt
i love u_124.txt
i love u_125.txt
i love u_126.txt
i love u_127.txt
i love u_128.txt
i love u_129.txt
i love u_12.txt
i love u_130.txt
i love u_131.txt
i love u_132.txt
i love u_133.txt
i love u_134.txt
i love u_135.txt
i love u_136.txt
i love u_137.txt
i love u_138.txt
i love u_139.txt
i love u_13.txt
i love u_140.txt
i love u_141.txt
i love u_142.txt
i love u_143.txt
i love u_144.txt
i love u_145.txt
i love u_146.txt
i love u_147.txt
i love u_148.txt
i love u_149.txt
i love u_14.txt
i love u_150.txt
i love u_151.txt
i love u_152.txt
i lo