## Lecture summary

- Text strings (continued)
- Exceptions and error handling
- File input / output
  - CSV files
  - JSON files

## Python Strings (continued)

[Chapter 6 – Manipulating Strings](https://automatetheboringstuff.com/2e/chapter6/) (Automate the Boring Stuff)

In [1]:
type("abc")

str

In [2]:
for name in dir(str):
    if not name.startswith("_"):
        print(name)

capitalize
casefold
center
count
encode
endswith
expandtabs
find
format
format_map
index
isalnum
isalpha
isascii
isdecimal
isdigit
isidentifier
islower
isnumeric
isprintable
isspace
istitle
isupper
join
ljust
lower
lstrip
maketrans
partition
removeprefix
removesuffix
replace
rfind
rindex
rjust
rpartition
rsplit
rstrip
split
splitlines
startswith
strip
swapcase
title
translate
upper
zfill


In [3]:
help(str.startswith)

Help on method_descriptor:

startswith(...) unbound builtins.str method
    S.startswith(prefix[, start[, end]]) -> bool

    Return True if S starts with the specified prefix, False otherwise.
    With optional start, test S beginning at that position.
    With optional end, stop comparing S at that position.
    prefix can also be a tuple of strings to try.



In [4]:
help(str.strip)

Help on method_descriptor:

strip(self, chars=None, /) unbound builtins.str method
    Return a copy of the string with leading and trailing whitespace removed.

    If chars is given and not None, remove characters in chars instead.



In [6]:
text = "   Uldis   "

In [7]:
# strip removes whitespace from start and end of the string
text.strip()

'Uldis'

In [8]:
text.lstrip()

'Uldis   '

In [9]:
text.rstrip()

'   Uldis'

In [10]:
text

'   Uldis   '

In [11]:
text = text.lstrip()

In [12]:
text

'Uldis   '

### Checking if string contains a substring


In [13]:
text2 = "This is an example"

In [14]:
text2.startswith("This")

True

In [15]:
text2.startswith("example")

False

In [16]:
text2.endswith("example")

True

In [17]:
# checking for substring
"is an" in text2

True

In [18]:
# let's put that in the if command:

if "is an" in text2:
    print("Substring found!")

Substring found!


In [19]:
# position of the 1st match

text2.index(" is ")

4

In [20]:
text2[4:]

' is an example'

In [21]:
help(str.index)

Help on method_descriptor:

index(...) unbound builtins.str method
    S.index(sub[, start[, end]]) -> int

    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.

    Raises ValueError when the substring is not found.



In [22]:
text2.index("not found")

ValueError: substring not found

In [23]:
help(str.find)

Help on method_descriptor:

find(...) unbound builtins.str method
    S.find(sub[, start[, end]]) -> int

    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.

    Return -1 on failure.



In [24]:
text2.find("not found")

-1

In [25]:
text2.find(" is ")

4

In [26]:
# how many times does a substring occur in text2?

text2.count("is")

2

### Different types of text content


In [29]:
text_str = "Just a string"
text_alpha = "Nospaceshere"
text_num = "123456"
text_alnum = "Password123"
text_upper = "THIS IS IMPORTANT"
text_lower = "nothing to see here"
text_whitespace = "   \t "

In [27]:
help(str.isalpha)

Help on method_descriptor:

isalpha(self, /) unbound builtins.str method
    Return True if the string is an alphabetic string, False otherwise.

    A string is alphabetic if all characters in the string are alphabetic and there
    is at least one character in the string.



In [30]:
text_alpha.isalpha()

True

In [31]:
# False, because of whitespace characters
text_str.isalpha()

False

In [32]:
text_num.isnumeric()

True

In [33]:
# False, because the string contains a dot
"123.45".isnumeric()

False

In [34]:
text_alnum.isalnum()

True

In [35]:
text_upper.isupper()

True

In [36]:
text_lower.islower()

True

In [37]:
text_whitespace.isspace()

True

### Justifying strings

In [42]:
text3 = "123"
text3

'123'

In [43]:
text4 = "567890"

In [44]:
text3.ljust(10)

'123       '

In [46]:
text3.rjust(10)

'       123'

In [47]:
text3.center(10)

'   123    '

In [48]:
print(text3.rjust(10))
print(text4.rjust(10))

       123
    567890


## Exceptions, Error Hadling

Exceptions and try/except commands let us handle ("catch") error situations.

- https://docs.python.org/3/tutorial/errors.html
- https://realpython.com/python-exceptions/

In [49]:
# Syntax errors (found before a program is run)

print("Missing end bracket"

SyntaxError: incomplete input (2057294690.py, line 3)

In [50]:
text

'Uldis   '

In [51]:
# Exceptions (raised when program is run)
# in this case: can not convert this string to an integer

print(int(text))

ValueError: invalid literal for int() with base 10: 'Uldis   '

In [52]:
print(int("123"))

123


In [53]:
int("123.45")

ValueError: invalid literal for int() with base 10: '123.45'

In [54]:
# Division by 0 error

print(100/0)

ZeroDivisionError: division by zero

In [55]:
# We can catch exceptions and do something with them

try:
    result = 100/0
    print(result)
    
except ZeroDivisionError as error:
    print("Can not divide by 0")
    
print("Continue executing the program")

Can not divide by 0
Continue executing the program


In [56]:
# "else" block is executed if there was no exception

n = 2

try:
    result = 100/n
    
except ZeroDivisionError as error:
    print("Can not divide by 0")
    
else:
    print("In 'else' block.")
    print("Result is", result)
    

In 'else' block.
Result is 50.0


In [57]:
a = input("Enter an integer value: ")

try:
    b = int(a)
    
except ValueError:
    print("Text must be an integer value")


Enter an integer value:  rty


Text must be an integer value


In [58]:
b

NameError: name 'b' is not defined

### Raising exceptions

In [59]:
# Your code may also raise exceptions

i = -5

if i < 1:
    raise Exception("Must be a positive number")

Exception: Must be a positive number

In [60]:
def test_pos_num(arg):
    
    if arg < 1:
        raise Exception("Must be a positive number")

In [61]:
test_pos_num(5)

In [62]:
test_pos_num(-1)

Exception: Must be a positive number

In [63]:
# Let's catch this exception

try:
    test_pos_num(i)
    
except Exception as error:
    print(error)
    
print("Program continues running.")

Must be a positive number
Program continues running.


### Exercise

Write a program that asks a user to input an integer value.
- if the value is an integer, print this value
- if it is not an integer, print an error message and ask the user to input an integer number again (until they input a valid integer value)

## File Input / Output

- https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files
- [Chapter 9 – Reading and Writing Files](https://automatetheboringstuff.com/2e/chapter9/) (Automate the Boring Stuff)

We can use Jupyter `%%writefile` to create an example file to work with.

We may also want to check what is the current directory (where the file will be created).

In [64]:
# uncomment the next line to check what is the current directory
%pwd

'/Users/captsolo/Documents/Code/LU_GeoPython/notebooks'

In [65]:
%%writefile somefile.txt
This is an example file
that we can experiment
with.

Writing somefile.txt


In [66]:
## Linux and Mac
%ls

## Windows
#!dir

01 - Python Introduction.ipynb
02 - Examples.ipynb
02 - Python Introduction.ipynb
03 - Python Functions.ipynb
04 - Python Dictionaries and Sets.ipynb
05 - Python Libraries.ipynb
[31m06 - Strings, File Input and Output.ipynb[m[m*
[31m07 - Object Oriented Programming.ipynb[m[m*
08 - File Operations 2.ipynb
09 - NumPy.ipynb
[31m10 - Pandas.ipynb[m[m*
[31m11 - Matplotlib.ipynb[m[m*
11 - Plotly.ipynb
12 - NetworkX.ipynb
Addresses_AP3.xlsx
Addresses_AP3ALLCLASSES_Geocoded_withID.xlsx
Colormaps_0.html
Colormaps_1.html
Colormaps_2.html
[31mFOLIUM.ipynb[m[m*
GeoPandas_intro.ipynb
GeoPy.ipynb
Geocode-Plot_Geonames_complexInput.ipynb
Numpy_exercises.ipynb
Numpy_exercises_hints.ipynb
Pandas Exercises.ipynb
Polyline_text_path.html
README.md
US_Unemployment_Oct2012.csv
Untitled.ipynb
Untitled1.ipynb
Untitled2.ipynb
[1m[34mdata[m[m/
[1m[34miframe_figures[m[m/
[1m[34mimg[m[m/
[1m[34min-class[m[m/
lu.html
mans_fails.py
nltk_example.ipynb
pandas_cheatsheet.pdf
program_exam

### Directories and file paths

- https://realpython.com/python-pathlib/
- [Chapter 9 – Reading and Writing Files](https://automatetheboringstuff.com/2e/chapter9/) (Automate the Boring Stuff)

In [67]:
# let's import the Path class
from pathlib import Path

In [68]:
# now we can make a Path for the current directory (.)
my_path = Path(".")

In [69]:
# ... and list contents of this directory (but the list is not ordered)

for item in my_path.iterdir():
    print(item)

nltk_example.ipynb
06 - Strings, File Input and Output.ipynb
pandas_cheatsheet.pdf
Untitled1.ipynb
FOLIUM.ipynb
.DS_Store
in-class
somefile.txt
GeoPandas_intro.ipynb
01 - Python Introduction.ipynb
04 - Python Dictionaries and Sets.ipynb
US_Unemployment_Oct2012.csv
Polyline_text_path.html
Untitled.ipynb
09 - NumPy.ipynb
03 - Python Functions.ipynb
Untitled2.ipynb
Numpy_exercises.ipynb
10 - Pandas.ipynb
mans_fails.py
07 - Object Oriented Programming.ipynb
Colormaps_2.html
Pandas Exercises.ipynb
riga.json
README.md
img
Colormaps_1.html
GeoPy.ipynb
11 - Matplotlib.ipynb
Colormaps_0.html
iframe_figures
.ipynb_checkpoints
Numpy_exercises_hints.ipynb
lu.html
us-states.json
08 - File Operations 2.ipynb
02 - Python Introduction.ipynb
program_example.py
Addresses_AP3.xlsx
data
11 - Plotly.ipynb
Addresses_AP3ALLCLASSES_Geocoded_withID.xlsx
12 - NetworkX.ipynb
05 - Python Libraries.ipynb
Geocode-Plot_Geonames_complexInput.ipynb
02 - Examples.ipynb


In [70]:
# let's order the file list

for item in sorted(my_path.iterdir()):
    print(item)

.DS_Store
.ipynb_checkpoints
01 - Python Introduction.ipynb
02 - Examples.ipynb
02 - Python Introduction.ipynb
03 - Python Functions.ipynb
04 - Python Dictionaries and Sets.ipynb
05 - Python Libraries.ipynb
06 - Strings, File Input and Output.ipynb
07 - Object Oriented Programming.ipynb
08 - File Operations 2.ipynb
09 - NumPy.ipynb
10 - Pandas.ipynb
11 - Matplotlib.ipynb
11 - Plotly.ipynb
12 - NetworkX.ipynb
Addresses_AP3.xlsx
Addresses_AP3ALLCLASSES_Geocoded_withID.xlsx
Colormaps_0.html
Colormaps_1.html
Colormaps_2.html
FOLIUM.ipynb
GeoPandas_intro.ipynb
GeoPy.ipynb
Geocode-Plot_Geonames_complexInput.ipynb
Numpy_exercises.ipynb
Numpy_exercises_hints.ipynb
Pandas Exercises.ipynb
Polyline_text_path.html
README.md
US_Unemployment_Oct2012.csv
Untitled.ipynb
Untitled1.ipynb
Untitled2.ipynb
data
iframe_figures
img
in-class
lu.html
mans_fails.py
nltk_example.ipynb
pandas_cheatsheet.pdf
program_example.py
riga.json
somefile.txt
us-states.json


In [71]:
# .glob() method lets us look for files matching the given pattern
# * = means any text

for item in my_path.glob("*.txt"):
    print(item)

somefile.txt


In [72]:
# we can also look for files in subdirectories of the given directory

# .. = the directory "above" the current directory in the directory tree
my_path2 = Path("..")

# Note: this does not work correctly in Google Colab

for item in my_path2.glob("**/05*"):
    print(item)

../notebooks/05 - Python Libraries.ipynb
../.git/objects/d7/05e40d4ce14d189a7b052e6bdb4097ed7dbfec
../notebooks/in-class/05 - Python Libraries.ipynb
../notebooks/.ipynb_checkpoints/05 - Python Libraries-checkpoint.ipynb
../notebooks/in-class/.ipynb_checkpoints/05 - Python Libraries-checkpoint.ipynb


### Reading a file

Now that we have found some files, we can look at reading and writing files.

In [73]:
fname = "somefile.txt"

In [74]:
file = open(fname, encoding="utf-8")

In [75]:
help(file)

Help on TextIOWrapper in module _io object:

class TextIOWrapper(_TextIOBase)
 |  TextIOWrapper(buffer, encoding=None, errors=None, newline=None, line_buffering=False, write_through=False)
 |
 |  Character and line based layer over a BufferedIOBase object, buffer.
 |
 |  encoding gives the name of the encoding that the stream will be
 |  decoded or encoded with. It defaults to locale.getencoding().
 |
 |  errors determines the strictness of encoding and decoding (see
 |  help(codecs.Codec) or the documentation for codecs.register) and
 |  defaults to "strict".
 |
 |  newline controls how line endings are handled. It can be None, '',
 |  '\n', '\r', and '\r\n'.  It works as follows:
 |
 |  * On input, if newline is None, universal newlines mode is
 |    enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
 |    these are translated into '\n' before being returned to the
 |    caller. If it is '', universal newline mode is enabled, but line
 |    endings are returned to the 

In [76]:
help(file.read)

Help on built-in function read:

read(size=-1, /) method of _io.TextIOWrapper instance
    Read at most size characters from stream.

    Read from underlying buffer until we have size characters or we hit EOF.
    If size is negative or omitted, read until EOF.



In [77]:
data = file.read()

print(data)

This is an example file
that we can experiment
with.



In [78]:
# it is good practice to close a file when you're done with it
file.close()

### Automatically closing a file using `with`

In [79]:
with open(fname) as file:
    data = file.read()

# file is closed automatically when the `with` block ends

print(data)

This is an example file
that we can experiment
with.



In [80]:
# we can also iterate (go through) a file line by line

with open(fname) as file:
    
    for line in file:
        print(">", line)

> This is an example file

> that we can experiment

> with.



In [81]:
# let's get rid of the empty lines

with open(fname) as file:
    
    for line in file:
        line = line.strip()
        print(">", line)

> This is an example file
> that we can experiment
> with.


In [82]:
# let's print out line numbers

with open(fname) as file:
    
    for num, line in enumerate(file):
        line = line.strip()
        print(num, ">", line)

0 > This is an example file
1 > that we can experiment
2 > with.


### Writing a file

Note: it is **important** to **close a file** after writing is finished. The `with` command does it automatically.

In [83]:
fname_out = "test123.txt"

text = """
This is an example text string.

f.write(string) writes the contents of string to the file, returning the number of characters written.
"""

with open(fname_out, "w", encoding="utf-8") as file_out:
    
    file_out.write(text)
    
    file_out.write(str(123) + "\n")
    file_out.write(str(456) + "\n")    

In [84]:
# Read a file to see what was written to it

def read_file(fname):

    with open(fname) as file:
        data = file.read()
        print(data)
        
read_file(fname_out)


This is an example text string.

f.write(string) writes the contents of string to the file, returning the number of characters written.
123
456



In [85]:
# You can also use print() to write to a file

with open(fname_out, "w", encoding="utf-8") as file_out:

    print(text, file=file_out)
    print(str(123), file=file_out)
    print(str(789), file=file_out)

In [86]:
read_file(fname_out)


This is an example text string.

f.write(string) writes the contents of string to the file, returning the number of characters written.

123
789



### CSV files

- https://docs.python.org/3/library/csv.html
- https://realpython.com/python-csv/

Comma-Separated Values
- rows of values (table cells) separated using a "," character (or using another separator)
- ar komatu (vai citu atdalītāju) atdalītas vērtības

---

#### Open Data in CSV format

Download COVID-19 open dataset (or any other CSV dataset) from data.gov.lv and put it in the current directory:
- https://data.gov.lv/dati/lv/dataset/stacionaru-operativie-dati-par-covid19

In [88]:
fname_csv = "covidpatients.csv"

In [89]:
from itertools import islice

In [91]:
with open(fname_csv, encoding="utf-8") as file_csv:

    # here we use islice() to limit the input data to 10 records
    first_10_rows = islice(file_csv, 10)
    
    for line in first_10_rows:
        print(line)

Datums;ĀI kods;ĀI nosaukums;Kopā;Jauni;Pamata diagnoze;Blakus diagnoze;Smaga slimības gaita;t.sk. Invazīva MPV;Vidēja slimības gaita;Miruši;Izrakstīti;Pārvesti;t.sk. uz augstāka līmeņa;t.sk. uz zemāka līmeņa;t.sk. uz tāda paša līmeņa

2022-02-01T00:00:00;320200001-01;Aizkraukles slimnīca;1;1;0;1;0;0;0;0;0;0;0;0;0

2022-02-01T00:00:00;360200027-01;Alūksnes slimnīca;0;0;0;6;0;0;0;0;1;0;0;0;0

2022-02-01T00:00:00;500200052-02;Balvu un Gulbenes slimnīcu apvienība;18;;12;6;2;0;16;1;1;0;0;0;0

2022-02-01T00:00:00;400200024-01;Bauskas slimnīca;2;0;4;0;0;0;4;0;1;0;0;0;0

2022-02-01T00:00:00;010011804-01;Bērnu klīniskā universitātes slimnīca;9;5;6;3;;;9;;9;;;;2

2022-02-01T00:00:00;661400011-01;Bērnu psihoneiroloģiskā slimnīca "Ainaži";2;1;0;1;0;0;0;0;0;0;0;0;0

2022-02-01T00:00:00;420200052-01;Cēsu klīnika;11;2;11;0;1;1;8;0;0;0;0;;0

2022-02-01T00:00:00;050012101-01;Daugavpils psihoneiroloģiskā slimnīca;6;1;3;3;0;0;1;0;0;1;0;0;1

2022-02-01T00:00:00;050012101-02;Daugavpils psihoneiroloģiskās s

In [None]:
# A-ha! This time ";" is the separator character

In [92]:
# Let's read it as a CSV file

import csv

with open(fname_csv, encoding="utf-8") as file_csv:
    
    rdr = csv.reader(file_csv)
    
    rdr_10_rows = islice(rdr, 10)
    
    for row in rdr_10_rows:
        print(row)
        

['Datums;ĀI kods;ĀI nosaukums;Kopā;Jauni;Pamata diagnoze;Blakus diagnoze;Smaga slimības gaita;t.sk. Invazīva MPV;Vidēja slimības gaita;Miruši;Izrakstīti;Pārvesti;t.sk. uz augstāka līmeņa;t.sk. uz zemāka līmeņa;t.sk. uz tāda paša līmeņa']
['2022-02-01T00:00:00;320200001-01;Aizkraukles slimnīca;1;1;0;1;0;0;0;0;0;0;0;0;0']
['2022-02-01T00:00:00;360200027-01;Alūksnes slimnīca;0;0;0;6;0;0;0;0;1;0;0;0;0']
['2022-02-01T00:00:00;500200052-02;Balvu un Gulbenes slimnīcu apvienība;18;;12;6;2;0;16;1;1;0;0;0;0']
['2022-02-01T00:00:00;400200024-01;Bauskas slimnīca;2;0;4;0;0;0;4;0;1;0;0;0;0']
['2022-02-01T00:00:00;010011804-01;Bērnu klīniskā universitātes slimnīca;9;5;6;3;;;9;;9;;;;2']
['2022-02-01T00:00:00;661400011-01;Bērnu psihoneiroloģiskā slimnīca "Ainaži";2;1;0;1;0;0;0;0;0;0;0;0;0']
['2022-02-01T00:00:00;420200052-01;Cēsu klīnika;11;2;11;0;1;1;8;0;0;0;0;;0']
['2022-02-01T00:00:00;050012101-01;Daugavpils psihoneiroloģiskā slimnīca;6;1;3;3;0;0;1;0;0;1;0;0;1']
['2022-02-01T00:00:00;050012101-02;Da

In [93]:
# Separator is ";"

data = []

with open(fname_csv, encoding="utf-8") as file_csv:
    
    rdr = csv.reader(file_csv, delimiter=";")
    
    rdr_10_rows = islice(rdr, 10)
    
    for row in rdr_10_rows:
        print(row)
        data.append(row)
    

['Datums', 'ĀI kods', 'ĀI nosaukums', 'Kopā', 'Jauni', 'Pamata diagnoze', 'Blakus diagnoze', 'Smaga slimības gaita', 't.sk. Invazīva MPV', 'Vidēja slimības gaita', 'Miruši', 'Izrakstīti', 'Pārvesti', 't.sk. uz augstāka līmeņa', 't.sk. uz zemāka līmeņa', 't.sk. uz tāda paša līmeņa']
['2022-02-01T00:00:00', '320200001-01', 'Aizkraukles slimnīca', '1', '1', '0', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0']
['2022-02-01T00:00:00', '360200027-01', 'Alūksnes slimnīca', '0', '0', '0', '6', '0', '0', '0', '0', '1', '0', '0', '0', '0']
['2022-02-01T00:00:00', '500200052-02', 'Balvu un Gulbenes slimnīcu apvienība', '18', '', '12', '6', '2', '0', '16', '1', '1', '0', '0', '0', '0']
['2022-02-01T00:00:00', '400200024-01', 'Bauskas slimnīca', '2', '0', '4', '0', '0', '0', '4', '0', '1', '0', '0', '0', '0']
['2022-02-01T00:00:00', '010011804-01', 'Bērnu klīniskā universitātes slimnīca', '9', '5', '6', '3', '', '', '9', '', '9', '', '', '', '2']
['2022-02-01T00:00:00', '661400011-01', 'Bērnu psi

In [94]:
data[0][2]

'ĀI nosaukums'

In [95]:
data[1][2]

'Aizkraukles slimnīca'

In [96]:
from pprint import pprint

pprint(data[:3])

[['Datums',
  'ĀI kods',
  'ĀI nosaukums',
  'Kopā',
  'Jauni',
  'Pamata diagnoze',
  'Blakus diagnoze',
  'Smaga slimības gaita',
  't.sk. Invazīva MPV',
  'Vidēja slimības gaita',
  'Miruši',
  'Izrakstīti',
  'Pārvesti',
  't.sk. uz augstāka līmeņa',
  't.sk. uz zemāka līmeņa',
  't.sk. uz tāda paša līmeņa'],
 ['2022-02-01T00:00:00',
  '320200001-01',
  'Aizkraukles slimnīca',
  '1',
  '1',
  '0',
  '1',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0',
  '0'],
 ['2022-02-01T00:00:00',
  '360200027-01',
  'Alūksnes slimnīca',
  '0',
  '0',
  '0',
  '6',
  '0',
  '0',
  '0',
  '0',
  '1',
  '0',
  '0',
  '0',
  '0']]


In [97]:
# Writing a CSV file

fname_out = "test_data.csv"

with open(fname_out, "w") as file_out:
    
    writer = csv.writer(file_out)
    
    for row in data:
        writer.writerow(row)
        

In [98]:
# we defined the read_file() function earlier 
read_file(fname_out)

Datums,ĀI kods,ĀI nosaukums,Kopā,Jauni,Pamata diagnoze,Blakus diagnoze,Smaga slimības gaita,t.sk. Invazīva MPV,Vidēja slimības gaita,Miruši,Izrakstīti,Pārvesti,t.sk. uz augstāka līmeņa,t.sk. uz zemāka līmeņa,t.sk. uz tāda paša līmeņa
2022-02-01T00:00:00,320200001-01,Aizkraukles slimnīca,1,1,0,1,0,0,0,0,0,0,0,0,0
2022-02-01T00:00:00,360200027-01,Alūksnes slimnīca,0,0,0,6,0,0,0,0,1,0,0,0,0
2022-02-01T00:00:00,500200052-02,Balvu un Gulbenes slimnīcu apvienība,18,,12,6,2,0,16,1,1,0,0,0,0
2022-02-01T00:00:00,400200024-01,Bauskas slimnīca,2,0,4,0,0,0,4,0,1,0,0,0,0
2022-02-01T00:00:00,010011804-01,Bērnu klīniskā universitātes slimnīca,9,5,6,3,,,9,,9,,,,2
2022-02-01T00:00:00,661400011-01,"Bērnu psihoneiroloģiskā slimnīca ""Ainaži""",2,1,0,1,0,0,0,0,0,0,0,0,0
2022-02-01T00:00:00,420200052-01,Cēsu klīnika,11,2,11,0,1,1,8,0,0,0,0,,0
2022-02-01T00:00:00,050012101-01,Daugavpils psihoneiroloģiskā slimnīca,6,1,3,3,0,0,1,0,0,1,0,0,1
2022-02-01T00:00:00,050012101-02,Daugavpils psihoneiroloģiskās slimnī

#### Reading files using Pandas

- https://realpython.com/python-csv/#parsing-csv-files-with-the-pandas-library

- https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html#pandas.DataFrame.to_csv

In [99]:
!pip install pandas



In [100]:
import pandas as pd

In [101]:
print(fname_csv)

covidpatients.csv


In [102]:
dataframe = pd.read_csv(fname_csv, delimiter=";")

dataframe[:10]

Unnamed: 0,Datums,ĀI kods,ĀI nosaukums,Kopā,Jauni,Pamata diagnoze,Blakus diagnoze,Smaga slimības gaita,t.sk. Invazīva MPV,Vidēja slimības gaita,Miruši,Izrakstīti,Pārvesti,t.sk. uz augstāka līmeņa,t.sk. uz zemāka līmeņa,t.sk. uz tāda paša līmeņa
0,2022-02-01T00:00:00,320200001-01,Aizkraukles slimnīca,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2022-02-01T00:00:00,360200027-01,Alūksnes slimnīca,0.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
2,2022-02-01T00:00:00,500200052-02,Balvu un Gulbenes slimnīcu apvienība,18.0,,12.0,6.0,2.0,0.0,16.0,1.0,1.0,0.0,0.0,0.0,0.0
3,2022-02-01T00:00:00,400200024-01,Bauskas slimnīca,2.0,0.0,4.0,0.0,0.0,0.0,4.0,0.0,1.0,0.0,0.0,0.0,0.0
4,2022-02-01T00:00:00,010011804-01,Bērnu klīniskā universitātes slimnīca,9.0,5.0,6.0,3.0,,,9.0,,9.0,,,,2.0
5,2022-02-01T00:00:00,661400011-01,"Bērnu psihoneiroloģiskā slimnīca ""Ainaži""",2.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,2022-02-01T00:00:00,420200052-01,Cēsu klīnika,11.0,2.0,11.0,0.0,1.0,1.0,8.0,0.0,0.0,0.0,0.0,,0.0
7,2022-02-01T00:00:00,050012101-01,Daugavpils psihoneiroloģiskā slimnīca,6.0,1.0,3.0,3.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0
8,2022-02-01T00:00:00,050012101-02,Daugavpils psihoneiroloģiskās slimnīcas Aknīst...,1.0,0.0,,1.0,,,1.0,,,,,,
9,2022-02-01T00:00:00,050020401-01,Daugavpils reģionālā slimnīca,81.0,10.0,59.0,22.0,4.0,1.0,55.0,1.0,10.0,,,,


In [103]:
dataframe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14434 entries, 0 to 14433
Data columns (total 16 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Datums                     14434 non-null  object 
 1   ĀI kods                    14434 non-null  object 
 2   ĀI nosaukums               14434 non-null  object 
 3   Kopā                       14257 non-null  float64
 4   Jauni                      12355 non-null  float64
 5   Pamata diagnoze            11786 non-null  float64
 6   Blakus diagnoze            13536 non-null  float64
 7   Smaga slimības gaita       10028 non-null  float64
 8   t.sk. Invazīva MPV         9338 non-null   float64
 9   Vidēja slimības gaita      11560 non-null  float64
 10  Miruši                     9790 non-null   float64
 11  Izrakstīti                 11649 non-null  float64
 12  Pārvesti                   8527 non-null   float64
 13  t.sk. uz augstāka līmeņa   7878 non-null   flo

In [104]:
dataframe.head()

Unnamed: 0,Datums,ĀI kods,ĀI nosaukums,Kopā,Jauni,Pamata diagnoze,Blakus diagnoze,Smaga slimības gaita,t.sk. Invazīva MPV,Vidēja slimības gaita,Miruši,Izrakstīti,Pārvesti,t.sk. uz augstāka līmeņa,t.sk. uz zemāka līmeņa,t.sk. uz tāda paša līmeņa
0,2022-02-01T00:00:00,320200001-01,Aizkraukles slimnīca,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2022-02-01T00:00:00,360200027-01,Alūksnes slimnīca,0.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
2,2022-02-01T00:00:00,500200052-02,Balvu un Gulbenes slimnīcu apvienība,18.0,,12.0,6.0,2.0,0.0,16.0,1.0,1.0,0.0,0.0,0.0,0.0
3,2022-02-01T00:00:00,400200024-01,Bauskas slimnīca,2.0,0.0,4.0,0.0,0.0,0.0,4.0,0.0,1.0,0.0,0.0,0.0,0.0
4,2022-02-01T00:00:00,010011804-01,Bērnu klīniskā universitātes slimnīca,9.0,5.0,6.0,3.0,,,9.0,,9.0,,,,2.0


In [105]:
dataframe[2:4]

Unnamed: 0,Datums,ĀI kods,ĀI nosaukums,Kopā,Jauni,Pamata diagnoze,Blakus diagnoze,Smaga slimības gaita,t.sk. Invazīva MPV,Vidēja slimības gaita,Miruši,Izrakstīti,Pārvesti,t.sk. uz augstāka līmeņa,t.sk. uz zemāka līmeņa,t.sk. uz tāda paša līmeņa
2,2022-02-01T00:00:00,500200052-02,Balvu un Gulbenes slimnīcu apvienība,18.0,,12.0,6.0,2.0,0.0,16.0,1.0,1.0,0.0,0.0,0.0,0.0
3,2022-02-01T00:00:00,400200024-01,Bauskas slimnīca,2.0,0.0,4.0,0.0,0.0,0.0,4.0,0.0,1.0,0.0,0.0,0.0,0.0


In [106]:
df2 = dataframe[["ĀI nosaukums", "Kopā"]]

df2[:5]

Unnamed: 0,ĀI nosaukums,Kopā
0,Aizkraukles slimnīca,1.0
1,Alūksnes slimnīca,0.0
2,Balvu un Gulbenes slimnīcu apvienība,18.0
3,Bauskas slimnīca,2.0
4,Bērnu klīniskā universitātes slimnīca,9.0


In [107]:
dataframe["Kopā"].sum()

163894.0

In [109]:
# Writing to a CSV file

fname_out_pandas = "test_pandas.csv"
df2.to_csv(fname_out_pandas)

In [110]:
# Print contents of the CSV file

with open(fname_out_pandas, encoding="utf-8") as in_file:
    
    data = islice(in_file, 6)
    
    for line in data:
        print(line, end="")

,ĀI nosaukums,Kopā
0,Aizkraukles slimnīca,1.0
1,Alūksnes slimnīca,0.0
2,Balvu un Gulbenes slimnīcu apvienība,18.0
3,Bauskas slimnīca,2.0
4,Bērnu klīniskā universitātes slimnīca,9.0


In [111]:
fname_out_excel = "test_pandas.xlsx"

In [112]:
df2.to_excel(fname_out_excel)

In [113]:
!ls *.xlsx

Addresses_AP3.xlsx
Addresses_AP3ALLCLASSES_Geocoded_withID.xlsx
test_pandas.xlsx


### JSON files

JSON (JavaScript Object Notation) files let us save Python data hierarchies (dictionaries, lists, ...) to a file / read them from a file.

https://www.json.org/json-en.html

```
json_object = {
  "key 1": "value 1",
  "key 2": ["value 2", "is", "a", "list"],
  "key 3": {"lists and dictionaries": "can be nested"}
}
```

To do this, we can use Python [json](https://docs.python.org/3/library/json.html) library:

- json.dump() – save structured data to a JSON file
- json.load() – read structured data from a JSON file

- json.dumps() – return structured data as a JSON string
- json.loads() – read structured data from a JSON string



In [114]:
import json

In [115]:
# data to be saved

data = {
  "key 1": "value 1",
  "key 2": ["value 2", "is", "a", "list"],
  "key 3": {"lists and dictionaries": "can be nested"}
}

In [116]:
data

{'key 1': 'value 1',
 'key 2': ['value 2', 'is', 'a', 'list'],
 'key 3': {'lists and dictionaries': 'can be nested'}}

In [117]:
# save data to a JSON file

json_file_name = "test_data.json"

with open(json_file_name, "w", encoding="utf-8") as file_out:

    # write the data object to a JSON file
    json.dump(data, file_out)

In [118]:
# let's look at the file that we created

with open(json_file_name, "r", encoding="utf-8") as file_in:
    for line in file_in:
        print(line)

{"key 1": "value 1", "key 2": ["value 2", "is", "a", "list"], "key 3": {"lists and dictionaries": "can be nested"}}


In [119]:
# now we can load this file back in, into a new variable

with open(json_file_name, "r", encoding="utf-8") as file_in:
    new_data = json.load(file_in)

In [120]:
new_data

{'key 1': 'value 1',
 'key 2': ['value 2', 'is', 'a', 'list'],
 'key 3': {'lists and dictionaries': 'can be nested'}}

## Summary

- Text strings (continued)
- Exceptions and error handling
- File input / output
  - CSV files
  - JSON files