<a href="https://colab.research.google.com/github/NSteckel/Data-in-Python/blob/main/String_List_and_File_Operations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# String, List and File Operations

#### String Operations

In [None]:
# Print to the console
print("Hello")

# Multiline strings with triple quotes
m1 = """
This string will
even work with
multiple lines.
"""
print(m1)

In [None]:
# Strings are byte arrays in python
# There is no char type

# We can index into strings with the standard 0 based index
print(m1[4])

# Loops in python
# For and While
# To print on a single line with multiple prints use end=''
for char in "Hippopotamus":
  print(char.upper(), end='')

print(len('Hippopotamus'))
if 'popo' in 'hippopotamus':
  print('True')

In [None]:
# Modifying strings

# Change the casing
b = 'This is a string.'
print(b.upper())
print(b.lower())
print(b.title())


# Replace a part of the string
# string.replace('old', 'new')
b.replace('string', 'sentence')


In [None]:
# Messy strings
# strip() and split()
c = "    This is a messy string.   ,"
c = c.strip(" ,")
print(c)

d = ",,,,rttgg......banana......rrr"
dip = d.strip(',.grt')
print(dip)

# split() to separate strings into a list of words
words = c.split()
print(words)

# Usually combine the two
e = "   This is a messy string. This is a second instance   "
woe = e.strip(" .").split(" ")
print(woe)

# Set of all words in the 'woe' string, no duplicates. Order also gets changed
unique_words = set(woe)
print(unique_words)

# Strip a specific side
print(e.rstrip())
print(e.lstrip())


In [None]:
# Combining strings!!

one = 'Hello'
two = 'World!'
print( one + ' ' + two)
print(f"{one} {two}")
print(one, two)

print("There's no issue here.")
print('How about "this?"')
print('This isn\'t a problem.')

# Text files have a line-ending character.
# For windowns it's \r\, and for linux/mac it's \n.

# check other escape characters in python

Hello World!
Hello World!
Hello World!
There's no issue here.
How about "this?"
This isn't a problem.


In [None]:
# Dynamic Strings
name = input("What is your name? ")
text = "Hello {0}!".format(name)
print(text)

# f'string
text = f'Welcome {name}'
print(text)

# r'string (raw string)
print('\\//\\//')
print(r'\\//\\//')

# b-string
# byte string
# they're stored in memory (in bytes) rather than as an object
bstring = b'Hello'
print(bstring)

What is your name? nateee
Hello nateee!
Welcome nateee
\//\//
\\//\\//
b'Hello'


- String Encoding
- Computers can't store letters and symbols in memory
- It must convert the human-readable characters into bytes
- When text is saved to disk it is stored as a series of bytes with a schema
- There are several different ways to encode the text we see into bytes.
- Python uses ASCII encodings by default (1-128)
- There are other options though

- URLs with funky characters

In [None]:
print('\u265E')
url = 'www.example.com?var=This%20is%20a%20simple%20%26%20short%20test.'
import urllib.parse
print(urllib.parse.unquote(url))

# Remember this for working with APIs

#### List Operations

In [None]:
# Lists

fruits = ['apple', 'cherry', 'grape', 'kiwi', 'mango', 'orange', 'peach']

# Index into lists with 0-based index
print(fruits[1])

# Python also supports negative indexes
# -1 is the last item, -2 is the second to last
print(fruits[-2])

cherry
orange


In [None]:
# Slicing
# Subset of the list
# Slicing follows the format [start, stop, step] # includes element at start, stops with element before end

fruits[1:3]

fruits[0:9]

# We still can't acess a specific out of bounds element
# ' fruits[9] ' will return an error, as there is not element in that index

print(fruits[:3]) # gets until index three
print(fruits[3:]) # gets at index three and after

['apple', 'cherry', 'grape']
['kiwi', 'mango', 'orange', 'peach']


In [None]:
# Reversing a list
fruits.reverse()
print(fruits)

# Slicing reverse
print(fruits[::-1])


['apple', 'cherry', 'grape', 'kiwi', 'mango', 'orange', 'peach']
['peach', 'orange', 'mango', 'kiwi', 'grape', 'cherry', 'apple']
None


In [None]:
text = 'This is a string.'
print(text[:4])
print(text[4:])

- Manipulating Arrays
- We can replace items with a given index or slice
- If the length of the slice and replacement are different, the lsit will change lengths


In [None]:
fruits[1] = 'pineapple'
fruits[-2:] = ['watermelon']
print(fruits)

# Add to a list
fruits.append('strawberry')

# insert at specific index
fruits.insert(3, 'pomegranate')
print(fruits)

['peach', 'pineapple', 'mango', 'kiwi', 'watermelon']
['peach', 'pineapple', 'mango', 'pomegranate', 'kiwi', 'watermelon', 'strawberry']


In [None]:
#combine two lists
tropical = ['papaya', 'lemon']
fruits.extend(tropical)
print(fruits)

['peach', 'pineapple', 'mango', 'pomegranate', 'kiwi', 'watermelon', 'strawberry', 'papaya', 'lemon']


In [None]:
# can also use the + operator
fruits += tropical
print(fruits)

['peach', 'pineapple', 'mango', 'pomegranate', 'kiwi', 'watermelon', 'strawberry', 'papaya', 'lemon', 'papaya', 'lemon']


In [None]:
# removing elements
# remove an element by value
# remove() only gets rid of the first instance of the element specified
fruits.remove('papaya')
print(fruits)

# remove and item by index with pop() = default index is the last element
fruits.pop()
fruits.pop(3)


['peach', 'pineapple', 'mango', 'pomegranate', 'kiwi', 'watermelon', 'strawberry', 'lemon', 'lemon']


'pomegranate'

In [None]:
del fruits[2]

In [None]:
# Remove all occurences of an element 

# With a list comprehension
# ' fruits = [fruit for fruit in fruits if fruit != 'papaya'] '

#filter function
#fruits = list(filter('pomegranate', fruits))



In [None]:
# Pythonic list looping 
# pural/singular forms of the noun 

for fruit in fruits:
  print(fruit)

print()

# we can also loop by index with range and len methods
for i in range(len(fruits)):
  print(fruits[i])

apple
cherry
grape
kiwi
mango
orange
peach

apple
cherry
grape
kiwi
mango
orange
peach


In [None]:
# for unknown lengths 
while fruits:
  print(fruits.pop())

peach
orange
mango
kiwi
grape
cherry
apple


In [None]:
# list comprehensions
capitalized_fruits = [fruit.title() for fruit in fruits]
print(capitalized_fruits)

a_fruits = [fruit for fruit in fruits if 'a' in fruit.lower()]

print(a_fruits)

['Apple', 'Cherry', 'Grape', 'Kiwi', 'Mango', 'Orange', 'Peach']
['apple', 'grape', 'mango', 'orange', 'peach']


In [None]:
# filter and replace
fruits_subbed = [fruit if fruit != 'banana' else 'orange' for fruit in fruits]
print(fruits_subbed)

['apple', 'cherry', 'grape', 'kiwi', 'mango', 'orange', 'peach']


In [None]:
# sorting lists
# list.sort() will sort case-sensitive alphabetically or numerically
# can pass in a custom function with the key parameter
fruits.sort(reverse=True)
print(fruits)

['peach', 'orange', 'mango', 'kiwi', 'grape', 'cherry', 'apple']
None


In [None]:
# Copying lists
# Can't use list2 = list1 to copy a list
# list2 will just reference list1

fruits2 = fruits
print(fruits2)
fruits.append('pear')
print(fruits2)

['peach', 'orange', 'mango', 'kiwi', 'grape', 'cherry', 'apple']
['peach', 'orange', 'mango', 'kiwi', 'grape', 'cherry', 'apple', 'pear']


In [None]:
# copy() method creates a true copy
fruits2 = fruits.copy()
print(fruits2)
fruits.append('lemon')
print(fruits2)

['peach', 'orange', 'mango', 'kiwi', 'grape', 'cherry', 'apple', 'pear']
['peach', 'orange', 'mango', 'kiwi', 'grape', 'cherry', 'apple', 'pear']


In [None]:
# use the list() to create a copy
fruits2 = list(fruits)

In [None]:
fruits2 = fruits[:]
fruits.append('lime')
print(fruits2)

['peach', 'orange', 'mango', 'kiwi', 'grape', 'cherry', 'apple', 'pear', 'lemon']


#### File Operations

Python has basic file reading/writing capabilities.
These can be useful for scripting or data exploration

- the open() function has 4 modes
- "r" - Read - Default value. Opens a file for reading, error if doesn't exist
- "a" - Append - Opens a file for appending, creates the file if doesn't exist
- "w" - Write - Opens a file for writing, creates the file if doesn't exist
- "x" - Create - Creates the specified file, returns an error if the file exists

We can also specify two ways to handle the file(s)
- "t" - Text - Default value. text mode
- "b" - Binary - Binary mode (e.g. images)

If we don't pass any arguments, the default is 'rt'

Code below is not functional! Need to set up a file path and stuff.

In [None]:
file_path = '/file/file/file'

# open the file and assign it to a variable
# make sure to close the file once done

f = open(file_path, 'rt')

book = f.read()
print(book[:50])

In [None]:
# since we already read the file, our current location is at the end of the file.
# use f.seek(index) to move to a specific index
f.seek(0)
book = f.readlines()

len(book)

print(book[1])

In [None]:
# close the file
f. close()

In [None]:
# Better alternative for opening files

# with open(file_path) as f:
#  lines = f.readlines()
#print(lines[:4])


# writing files
# choose append 'a' or write 'w' mode
# 'w' will overwrite any existing content

In [None]:
out_file = '/content/drive/MyDrive/sample.txt'
line1 = 'This is some file content!\n'
line2 = ['This is some more file content.\n', 'And even more!\n']
line3 = 'This will be the only line in the file.'

with open(out_file, 'w') as f:
  f.write(line1)

with open(out_file, 'a') as f:
  f.writelines(line2)

with open(out_file) as f:
  print(f.read())



FileNotFoundError: ignored