# Working with files


## Contents

0. Install packages
1. Writing to a text file
2. Reading from a text file
3. JSON
4. YAML
5. Pickle
6. XML to JSON
7. Credentials file (simple)
8. Python dict to XML

## 0. Install packages

In [2]:
%pip install pyyaml 

Note: you may need to restart the kernel to use updated packages.


In [4]:
# we will use glob to show a list of certain filetype 
import glob
png_files = glob.glob('*.png')
png_files

['anatomy of an array.png', 'opencv.png', 'plot.png', 'road2.png']

#### The other way of doing this is by using ls

In [7]:
ls *.txt 

 Volume in drive C is OS
 Volume Serial Number is 2818-58FB

 Directory of C:\Users\31653\Documents\GitHub\Notebooks

06/02/2021  17:07                84 accounts.txt
23/06/2021  20:11                84 demo.txt
26/07/2021  15:45                59 servers.txt
17/04/2021  21:30                 0 untitled.txt
17/04/2021  21:31                16 untitled1.txt
               5 File(s)            243 bytes
               0 Dir(s)  286.964.838.400 bytes free


## 1. Writing to a text file

In [13]:
#this scripts creates a .txt file with 3 lines Michiel, Bontenbal and empty line
with open ('accounts.txt', mode='w') as accounts:
    accounts.write('100 Jones 24.98\n')
    accounts.write('200 Doe 345.67\n')
    accounts.write('300 White 0.00\n')
    print('400 Stone -42.16', file=accounts)#also create a new line in the file
    print('500 Rich 224.62', file=accounts)
#python closes the file

In [15]:
pwd

'C:\\Users\\31653\\Documents\\GitHub\\Notebooks'

In [28]:
import glob
my_txt_files = glob.glob('*.txt')
my_txt_files

['accounts.txt',
 'accounts2.txt',
 'a_file.txt',
 'demo.txt',
 'NOAA_data.txt',
 'plantuml.txt',
 'servers.txt',
 'XML_file.txt']

In [19]:
#display the file using %pycat
%pycat C:\\Users\\31653\\Documents\\GitHub\\Notebooks\accounts.txt

In [24]:
#or display it using python
text_file = open('C:\\Users\\31653\\Documents\\GitHub\\Notebooks\\accounts.txt')
file_content = text_file.read()
print(file_content)
text_file.close()

100 Jones 24.98
200 Doe 345.67
300 White 0.00
400 Stone -42.16
500 Rich 224.62



### 1b. Converting a python list to a txt.file

In [32]:
a_list = ["abc", "def", "ghi"]
textfile = open("a_file.txt", "w")
textfile.write('header\n')
for element in a_list:
    textfile.write(element + "\n")
textfile.close()

In [33]:
%pycat C:\\Users\\31653\\Documents\\GitHub\\Notebooks\a_file.txt

## 2. Reading Data from a text file

In [2]:
with open ('accounts.txt', mode='r') as accounts:
    print(f'{"Account":<10}{"Name": <10}{"Balance":>10}')#create three headers/columns with 10 characters and align < or >
    for record in accounts: #for each row do the followinh
        account, name, balance = record.split()
        print(f'{account:<10}{name:<10}{balance:>10}')

Account   Name         Balance
100       Jones          24.98
200       Doe           345.67
300       White           0.00
400       Stone         -42.16
500       Rich          224.62


In [20]:
with open('accounts.txt') as input_file:
          line = input_file.readline()
          while line:
                line = line.strip()
                print(line)
                line = input_file.readline()

100 Jones 24.98
200 Doe 345.67
300 White 0.00
400 Stone -42.16
500 Rich 224.62


In [26]:
with open('accounts.txt') as input_file:
    with open('demo.txt', 'w') as output_file:
        line = input_file.readline()
        while line:
            print(line.count('Doe'))
            line = line.strip()
            print(line)
            output_file.write(line+ '\n')
            line = input_file.readline()

0
100 Jones 24.98
1
200 Doe 345.67
0
300 White 0.00
0
400 Stone -42.16
0
500 Rich 224.62


In [24]:
line.count('Doe')

0

In [22]:
ls *.txt

 Volume in drive C is OS
 Volume Serial Number is 2818-58FB

 Directory of C:\Users\31653\Documents\GitHub\Notebooks

06/02/2021  17:07                84 accounts.txt
23/06/2021  19:52                 0 demo.txt
17/04/2021  21:30                 0 untitled.txt
17/04/2021  21:31                16 untitled1.txt
               4 File(s)            100 bytes
               0 Dir(s)  281.730.273.280 bytes free


## 3. JSON 
source: https://www.programiz.com/python-programming/json

In Python 3:
 - json.loads take a string as input and returns a dictionary as output.
 - json.dumps take a dictionary as input and returns a string as output.
 - json.load you can load a json file

In [1]:
import glob
my_jsons = glob.glob('*.json')
my_jsons

['edgeimpulse.json',
 'person.json',
 'petstore_openapi3.json',
 'plane.json',
 'sarcasm.json']

In [4]:
try:
    data = json.loads('person.json')
except:
    print('json.loads gives an error')
    

json.loads gives an error


In [9]:
import json
try:
    data = json.dumps('person.json')
except:
    pass
print(data)

"person.json"


In [11]:
# to open the json file use json.load
import json

with open('person.json') as f:
    data = json.load(f)

# Output: {'name': 'Bob', 'languages': ['English', 'Fench']}
print(data)

{'name': 'Bob', 'languages': ['English', 'French']}


In [2]:
import json

person = '{"name": "Michiel", "languages": ["English", "French", "Italian"]}'
person_dict = json.loads(person)

# Output: {'name': 'Bob', 'languages': ['English', 'French']}
print(person_dict)

# Output: ['English', 'French']
print(person_dict['languages'][0]+" and "+ person_dict['languages'][1])
print(type(person_dict))

{'name': 'Michiel', 'languages': ['English', 'French', 'Italian']}
English and French
<class 'dict'>


dict_keys(['name', 'languages'])

In [47]:
import json
with open('edgeimpulse.json') as f:
    data=json.load(f)
    
list_item_0 =data['result']['bounding_boxes'][0]
print(list_item_0)
print(list_item_0['value'], list_item_0['label'])
print(list_item_0['label'])
#type(list_item_0)

{'height': 252, 'label': 'zebra', 'value': 0.9409910440444946, 'width': 201, 'x': 13, 'y': 38}
0.9409910440444946 zebra
zebra


In [1]:
import json
python_obj = '{"a":  1, "a":  2, "a":  3, "a": 4, "b": 1, "b": 2, "c": 0}'
#print("Original Python object:")
#print(python_obj)
json_obj = json.loads(python_obj)
print("\nUnique Key in a JSON object:")
print(json_obj) 


Unique Key in a JSON object:
{'a': 4, 'b': 2, 'c': 0}


In [None]:
import json
python_obj = '{"a":  1, "a":  2, "a":  3, "a": 4, "b": 1, "b": 2, "c": 0}'

In [2]:
type(json_obj)

dict

In [7]:
#get only the keys as this is a dict
json_obj.keys()

dict_keys(['a', 'b', 'c'])

In [27]:
#Voorbeeld GIJS
import json
json_string = {"my_boolean": True,"my_int": 3,"my_float": 1.2,"my_none": None,"my_string": "hello world"}
struct_1 = json.loads(json_string)

for item in struct_1:
    print(f"type={type(item)}, (item)")

TypeError: the JSON object must be str, bytes or bytearray, not dict

In [6]:
json_string

'"lijst":[{"my_boolean": true,"my_int": 3,"my_float": 1.2,"my_none": null,"my_string": \'hello world\'}]'

In [4]:
# how to check if a key is in a dictionary
# source: https://www.kite.com/python/answers/how-to-check-if-a-key-exists-in-a-json-string-in-python
json_string = """{"a": 1, "b": 2, "c": 3}"""
a_dictionary = json.loads(json_string)

b_in_dict =  "b" in a_dictionary

print(b_in_dict)


True


In [11]:
import json

with open(test_json) as jsonFile:
    data = json.load(jsonFile)
    jsonData = data["emp_details"]
    for x in jsonData:
        keys = x.keys()
        print(keys)
        values = x.values()
        print(values)

TypeError: expected str, bytes or os.PathLike object, not dict

## 4. YAML

Pypi: https://pypi.org/project/PyYAML/

source: https://zetcode.com/python/yaml/

Use the following files: items.yaml en 

raincoat: 1
coins: 5
books: 23
spectacles: 2
chairs: 12
pens: 6

In [4]:
import yaml

with open('items.yaml') as f:
    
    data = yaml.load(f, Loader=yaml.FullLoader)
    print(data)

{'raincoat': 1, 'coins': 5, 'books': 23, 'spectacles': 2, 'chairs': 12, 'pens': 6}


In [5]:
type(data)

dict

data.yaml:
cities:
  - Bratislava
  - Kosice
  - Trnava
  - Moldava
  - Trencin
---
companies:
  - Eset
  - Slovnaft
  - Duslo Sala
  - Matador Puchov

In [14]:
#!/usr/bin/env python3

import yaml

with open('data.yaml') as f:
    
    docs = yaml.load_all(f, Loader=yaml.FullLoader)

    for doc in docs:
        
        for k, v in doc.items():
           print(k, "->", v)
           

cities -> ['Bratislava', 'Kosice', 'Trnava', 'Moldava', 'Trencin']
companies -> ['Eset', 'Slovnaft', 'Duslo Sala', 'Matador Puchov']


## 5. Pickle

Pickle is Python's native object serialization module.  

Pickle official information: https://docs.python.org/3/library/pickle.html
Pickle tutorial: https://www.datacamp.com/community/tutorials/pickle-python-tutorial

#### Pickle vs JSON
- Pickle pro's = python native
- JSON pro's = faster, interoperability, more secure

In [3]:
import pickle

In [7]:
# Save a dictionary into a pickle file.
import pickle

favorite_color = { "lion": "yellow", "kitty": "red" }
pickle.dump( favorite_color, open( "save.pkl", "wb" ) )

In [8]:
# Open the pickle file and assign to a new variable
favorite_color_new = pickle.load( open( "save.pkl", "rb" ) )
favorite_color_new

{'lion': 'yellow', 'kitty': 'red'}

## 6. XML to dict / JSON
source: https://www.askpython.com/python-modules/xmltodict-module#:~:text=We%20can%20convert%20XML%20files%20to%20a%20Python,Ordered%20Dictionary%20using%20dict%20constructor%20for%20Python%20dictionaries.

In [1]:
%pip install xmltodict

Note: you may need to restart the kernel to use updated packages.


In [2]:
import xmltodict
import json

xml='''<website>
        <name>Codespeedy</name>
        <article>Related to programming</article>
        <message>You can learn easily from codespeedy</message>
    </website>'''

my_dict=xmltodict.parse(xml)
json_data=json.dumps(my_dict)
print(json_data)

{"website": {"name": "Codespeedy", "article": "Related to programming", "message": "You can learn easily from codespeedy"}}


In [3]:
my_xml='''<emp_bank_tx>
         <ebnkt_bank_def_id>BVKZ</ebnkt_bank_def_id>
         <ebnkt_employee_id>99999901</ebnkt_employee_id>
         <ebnkt_transaction_date>2021-03-01</ebnkt_transaction_date>
         <ebnkt_transaction_type>3</ebnkt_transaction_type>
         <ebnkt_amount>56h00</ebnkt_amount>
         <ebnkt_reason_code>ECALC</ebnkt_reason_code>
        </emp_bank_tx>'''

In [16]:
#copy paste your xml message here to convert it to a dict:
my_xml= '''
 <message>
                <calculateTimeslotRequest>
                    <timeslotParameters>
                        <fromDate>2021-09-10</fromDate>
                        <tillDate>2021-09-17</tillDate>
                    </timeslotParameters>
                    <orderData>
                        <requestOnly>
                            <id>386210</id>
                            <address>
                                <streetName>Dassenberg</streetName>
                                <doorNumber>56</doorNumber>
                                <zipcode>3825BC</zipcode>
                                <city>AMERSFOORT</city>
                                <countryCode>NL</countryCode>
                            </address>
                            <duration>45</duration>
                            <requiredCapabilities>
                                <capability>Loodgiet</capability>
                                <capability>AMF</capability>
                            </requiredCapabilities>
                            <planRegion>306510</planRegion>
                        </requestOnly>
                    </orderData>
                </calculateTimeslotRequest>
            </message>
'''
print(my_xml)


 <message>
                <calculateTimeslotRequest>
                    <timeslotParameters>
                        <fromDate>2021-09-10</fromDate>
                        <tillDate>2021-09-17</tillDate>
                    </timeslotParameters>
                    <orderData>
                        <requestOnly>
                            <id>386210</id>
                            <address>
                                <streetName>Dassenberg</streetName>
                                <doorNumber>56</doorNumber>
                                <zipcode>3825BC</zipcode>
                                <city>AMERSFOORT</city>
                                <countryCode>NL</countryCode>
                            </address>
                            <duration>45</duration>
                            <requiredCapabilities>
                                <capability>Loodgiet</capability>
                                <capability>AMF</capability>
                         

In [30]:
import xmltodict
import json

xml=my_xml

my_dict=xmltodict.parse(xml)
json_data=json.dumps(my_dict)
print(json_data)

{"message": {"calculateTimeslotRequest": {"timeslotParameters": {"fromDate": "2021-09-10", "tillDate": "2021-09-17"}, "orderData": {"requestOnly": {"id": "386210", "address": {"streetName": "Dassenberg", "doorNumber": "56", "zipcode": "3825BC", "city": "AMERSFOORT", "countryCode": "NL"}, "duration": "45", "requiredCapabilities": {"capability": ["Loodgiet", "AMF"]}, "planRegion": "306510"}}}}}


In [36]:
json_obj = json.loads(json_data)
json_obj['message'].keys()

dict_keys(['calculateTimeslotRequest'])

In [19]:
my_dict.keys()

odict_keys(['message'])

In [26]:
for key in my_dict.keys(): #use a for loop to print the dictionary
    value = my_dict[key]
    print(key, "=", value)

message = OrderedDict([('calculateTimeslotRequest', OrderedDict([('timeslotParameters', OrderedDict([('fromDate', '2021-09-10'), ('tillDate', '2021-09-17')])), ('orderData', OrderedDict([('requestOnly', OrderedDict([('id', '386210'), ('address', OrderedDict([('streetName', 'Dassenberg'), ('doorNumber', '56'), ('zipcode', '3825BC'), ('city', 'AMERSFOORT'), ('countryCode', 'NL')])), ('duration', '45'), ('requiredCapabilities', OrderedDict([('capability', ['Loodgiet', 'AMF'])])), ('planRegion', '306510')]))]))]))])


### XML to JSON

In [25]:
#source: https://www.askpython.com/python-modules/xmltodict-module#:~:text=We%20can%20convert%20XML%20files%20to%20a%20Python,Ordered%20Dictionary%20using%20dict%20constructor%20for%20Python%20dictionaries
#import module
import xmltodict
import json
 
#open the file
#   alonefileptr = open("/home/aditya1117/askpython/plane.xml","r")
 
#read xml content from the file
#   xml_content= fileptr.read()
#print("XML content is:")
#print(xml_content)

xml=my_xml
 
#change xml format to ordered dict
my_ordered_dict=xmltodict.parse(xml)
print("Ordered Dictionary is:")
print(my_ordered_dict)
json_data= json.dumps(my_ordered_dict) #create a string with json_data
print("JSON data is:")
print(json_data)
x= open("plane.json","w")
x.write(json_data)
x.close()
my_ordered_dict.keys()

Ordered Dictionary is:
OrderedDict([('message', OrderedDict([('calculateTimeslotRequest', OrderedDict([('timeslotParameters', OrderedDict([('fromDate', '2021-09-10'), ('tillDate', '2021-09-17')])), ('orderData', OrderedDict([('requestOnly', OrderedDict([('id', '386210'), ('address', OrderedDict([('streetName', 'Dassenberg'), ('doorNumber', '56'), ('zipcode', '3825BC'), ('city', 'AMERSFOORT'), ('countryCode', 'NL')])), ('duration', '45'), ('requiredCapabilities', OrderedDict([('capability', ['Loodgiet', 'AMF'])])), ('planRegion', '306510')]))]))]))]))])
JSON data is:
{"message": {"calculateTimeslotRequest": {"timeslotParameters": {"fromDate": "2021-09-10", "tillDate": "2021-09-17"}, "orderData": {"requestOnly": {"id": "386210", "address": {"streetName": "Dassenberg", "doorNumber": "56", "zipcode": "3825BC", "city": "AMERSFOORT", "countryCode": "NL"}, "duration": "45", "requiredCapabilities": {"capability": ["Loodgiet", "AMF"]}, "planRegion": "306510"}}}}}


odict_keys(['message'])

### 2. Write from txt file to an xml

In [None]:
#source: https://www.kite.com/python/answers/how-to-write-a-list-to-a-file-in-python
xml_list=[]
textfile = open("XML_file.txt", "w")
for element in xml_list:
    textfile.write(element + "\n")
textfile.close()

## 6a. JSON to UBL-XML
source: https://json-to-ubl-xml-transformer.readthedocs.io/en/latest/installation.html
source: https://github.com/dimitern/json_to_ubl_xml_transformer

In [1]:
!pip install json_to_ubl_xml_transformer

Collecting json_to_ubl_xml_transformer
  Downloading json_to_ubl_xml_transformer-0.2.1-py2.py3-none-any.whl (9.1 kB)
Installing collected packages: json-to-ubl-xml-transformer
Successfully installed json-to-ubl-xml-transformer-0.2.1


In [2]:
import json_to_ubl_xml_transformer

## 7. Credentials file

First make a seperate textfile called 'credentials.py' as follows:

username = "xy" <br>
password = "abcd"

In [4]:
import credentials
username =credentials.username
password = credentials.password
print(username, password)

xy abcd


## 8. Python dict to XML
source: https://www.geeksforgeeks.org/serialize-python-dictionary-to-xml/

In [2]:
!pip install dict2xml

Collecting dict2xml
  Downloading dict2xml-1.7.1.tar.gz (6.6 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: dict2xml
  Building wheel for dict2xml (pyproject.toml): started
  Building wheel for dict2xml (pyproject.toml): finished with status 'done'
  Created wheel for dict2xml: filename=dict2xml-1.7.1-py3-none-any.whl size=6930 sha256=7d468861b07a272a4ea3b7232fd66ddceabcb04c27426890b59486b8e7ba67f6
  Stored in directory: c:\users\31653\appdata\local\pip\cache\wheels\ab\23\ef\f33d7e60cafeb4f4e62c8d2b76c59875e5d4018d0d69fa85c9
Successfully built dict2xml
Installing collected packages: dict2xml
Successfully installed dict2xml-1.7.1


In [3]:
# Converting Python Dictionary to XML
# with a root elemtnt
from dict2xml import dict2xml
 
data = {'a': 2,
        'b': {
               'c': 'as',
               'f': True},
        'd': 7,
        }
 
xml = dict2xml(data, wrap ='root', indent ="   ")
print(xml)

<root>
   <a>2</a>
   <b>
      <c>as</c>
      <f>True</f>
   </b>
   <d>7</d>
</root>
