# Python String Variables

https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str

In [None]:
#This cell changes the notebooks default behavior of only showing
#the last item in a cell and causes it to show all the values in a cell.


from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

Lets talk about data in memory on our computer.  Its all binary. But we can look at or "interpret" the data in different ways.

In [None]:
x = "\x70\x79\x74\x68\x6f\x6e"


#As binary 
format(int.from_bytes(x, byteorder="big"),"048b")

#As hex
format(int.from_bytes(x, byteorder="big"),"06x")

#As decimal numbers
print(",".join([num for num in x]))

#As characters
print(x.decode())

#But also as code!
print("Print is just bytes in memory", print)

#And EXE's and DLL's, and JPEG (Images) and PDF's etc,etc

There are a couple of different ways we can interpret that binary data in memory as strings. As discussed in more detail in our Data Representation lecture we have 

ASCII   - https://www.asciitable.com/

UTF-16  - Often found in the registry, office documents and other data files.

UTF-8  - Python stores all strings in UTF-8 format

# Python String Types

Python has several String Type Variables.  Here are some of the most commonly used ones and their purposes:

str - This is the most common string type in Python, and it is used for working with Unicode text.  These values are stored in UTF-8 format.

bytes - Every 8 bits are stored as values between 0 and 255. No interpretation is done on them unless they are printed or viewed. If printed or viewed Python will print any characters it ASCII character and it will print hexadecimal for non-ASCII characters. This is used for working with byte data, such as images, audio, and binary file formats.

bytearray - This is similar to bytes, but it is "mutable", meaning that you can change its contents.


## String Methods

In [None]:
s = "   hello world!   "
s

In [None]:
# Define a string
s = "   hello world!   "

# Capitalize the first letter of the string
t = s.capitalize()
print("Capitalized string:", t)

# Convert the entire string to uppercase
t = s.upper()
print("Uppercase string:", t)

# Convert the entire string to lowercase
t = s.lower()
print("Lowercase string:", t)

# Strip leading and trailing whitespace from the string
t = s.strip()
print("Stripped string:", t)

# Replace all occurrences of "l" with "x"
t = s.replace("l", "x")
print('Replacing "l" with "x":', t)

# Split the string into a list of words
t = s.split()
print("Split string:", t)

# Join the list of words into a single string, separated by commas
t = ",".join(t)
print("Joined string:", t)

# Check if the string starts with "hello"
if s.startswith("hello"):
    print("The string starts with 'hello'")
else:
    print("The string does not start with 'hello'")

# Check if the string ends with "world"
if s.endswith("world"):
    print("The string ends with 'world'")
else:
    print("The string does not end with 'world'")


It is important to recognize that these methods DO NOT change the value in the variable. Instead they produce NEW strings that show the changes.

In [None]:
x = '    hello world    '
x.upper()

#Note that x does not change!
print(x)

#If we want to change the contents of `x`` we have to reassign it.
x = x.upper()
print(x)

#This behavior of NOT updating the variable is required of all "immutable" variable types.

## String Slicing

Python Slicing operations allow us to extract a substring from a larger string.

string_object[start:stop:step]

 - start, stop and step are optional values.
 - If no start is provided it is the same as starting at 0.
 - If no stop is provided it will go all the way to the end of the string.
 - If no step is provided it will assume a step of 1.
 - If you specify a stop it must come after the first colon.
 - If you specify a step it must come after the second colon.
 - When you place a negative value in step start:stop are reversed to stop:start and it it is now "up to and including" instead of "up to not including"


In [None]:
s = 'Hello World!'
s

In [None]:
# Define a string
s = "Hello World!"

# Extract the first character of the string
t = s[0]
print("First character:", t)

# Extract the last character of the string
t = s[-1]       
print("Last character:", t)

# Extract a substring from index 2 to index 6 (exclusive)
t = s[2:6]      #means s[2:6:1]
print("Substring [2:6]:", t)

# Extract a substring from index 2 to the end of the string
t = s[2:]      #means s[2:end:1]
print("Substring [2:]:", t)

# Extract a substring from the start of the string to index 6 (exclusive)
t = s[:6]    #means s[0:6:1]
print("Substring [:6]:", t)

# Extract a substring from the start of the string to the end of the string
t = s[:]    # means s[0:end:1]
print("Substring [:]:", t)

# Extract every other character starting from index 0
t = s[::2]    # means s[0:end:2]
print("Every other character [::2]:", t)

# Extract every other character starting from index 1
t = s[1::2]      #means s[1:end:2]
print("Every other character [1::2]:", t)

# Extract the string in reverse order
t = s[::-1]      #means s[end:0:-1]
print("Reversed string [::-1]:", t)

# Extract every other character in reverse order
t = s[::-2]      #means s[end:0:-2]
print("Every other character in reverse order [::-2]:", t)

# Extract a substring from index -6 to index -1 (exclusive)
t = s[-6:-1]    #means s[-6:-1:1]
print("Substring [-6:-1]:", t)

# Extract a substring from index -6 to the end of the string
t = s[-6:]
print("Substring [-6:]:", t)

# Extract a substring from the start of the string to index -7 (exclusive)
t = s[:-7]
print("Substring [:-7]:", t)


### Basic String Operations

Here are some examples of common operations you might perform on strings

In [None]:
# Create a string
s = "Hello, World!"

# Get the length of the string
n = len(s)
print("Length of string:", n)


In [None]:

# Concatenate two strings
s1 = "Hello"
s2 = "World"
s3 = s1 + " " + s2
print("Concatenated string:", s3)


In [None]:

# Repeat a string
s4 = "Ha" * 4
print("Repeated string:", s4)


In [None]:

# Replace a substring with another substring
s5 = s.replace("World", "Universe")
print("Replaced string:", s5)


In [None]:

# Convert a number to a string
x = 42
s6 = str(x)
print("Converted number to string:", s6)


In [None]:

# Convert a string to a float
s7 = "3.14159"
x = float(s7)
print("Converted string to a float:", x)


In [None]:

# Convert the first letter of each word in a string to uppercase
s8 = "hello world"
s9 = s8.title()
print("Title-cased string:", s9)


In [None]:

# Convert all the letters in a string to uppercase
s10 = "hello world"
s11 = s10.upper()
print("Uppercase string:", s11)


In [None]:

# Convert all the letters in a string to lowercase
s12 = "HELLO WORLD"
s13 = s12.lower()
print("Lowercase string:", s13)


In [None]:

# Check if a substring is present in a string
s14 = "Hello, World!"
if "World" in s14:
    print("Substring found!")


In [None]:

# Find the index of the first occurrence of a substring
s15 = "Hello, World!"
i = s15.find("World")
print("Index of first occurrence of substring:", i)


In [None]:

# Remove whitespace from the beginning and end of a string
s16 = "  Hello, World!  "
s17 = s16.strip()
print("Stripped string:", s17)


In [None]:

# Split a string into a list of substrings
s18 = "apple,banana,orange"
s19,s20,s21 = s18.split(",")
print("The third value upper case is :", s21.upper())


In [None]:

# Join a list of substrings into a single string
l = ["apple", "banana", "orange"]
s22 = ",".join(l)
print("Joined string:", s22)


### Examples of fstrings

F-Strings allow you to variable names and/or expressions inside of your strings inside of curly brackets.

{ variable : format-specifier}

Example:
```
x = "slim shady"
print("My name is {x:*^10}")
```

https://docs.python.org/3/library/string.html#formatspec

The way I remember this "FALT".
 - F = Fill Character
 - A = Alignment <^>
 - L = Length of string to print
 - T = Type conversion to perform

Note: These same format specifier are use as the second argument to the format() function.

```
x = 10
#We could 
print("{x:08b}")   #output: 1010

#is the same as
print(format(x, "08b"))  #output: 1010  
```

In [None]:
# define some variables
name = "Alice"
age = 30
height = 1.75
weight = 65.4
print(f"Her name is {name}")

In [None]:
# define some variables
name = "Alice"
age = 30
height = 1.75
weight = 65.4

# basic f-string
print(f"Her name is {name}.")

# integer formatting with minimum width
print(f"She is {age:02d} years old.")

# float formatting with precision
print(f"Her height is {height:.2f} meters.")

# float formatting with exponent notation
print(f"Her weight is {weight:e} kilograms.")

# string padding with center alignment
message = "Hello"
print(f"{message:-^20}")

# string padding with right alignment
message = "World"
print(f"{message:->20}")

# converting values to strings
x = 42
y = 3.14
print(f"x = {str(x)}")
print(f"y = {str(y)}")

# combining strings with join method
words = ["hello", "world"]
sentence = " ".join(words)
print(f"sentence = {sentence}")

# changing parts of a string with replace method
text = "The quick brown fox jumps over the lazy dog."
new_text = text.replace("fox", "cat")
print(f"new_text = {new_text}")
print(f"NOTICE: the original text didn't change {text}")


Remember Strings are UTF-8 encoded. Multiple bytes are used to represent a character.
So a single glyph might take multiple bytes of storage to represent it.

We can turn a string into its underlying bytes by calling `.encode()`

`"string".encode()`

In [None]:
# Remember a single Glyph may be multiple bytes
unicode_char = '🐍'
print(f"This required 4 bytes! They are :",unicode_char.encode())

# Convert the 4 bytes to binary
binary_char = '  '.join(['{0:08b}'.format(byte) for byte in unicode_char.encode()])

# Print the binary representation so we can see them
print(binary_char)

Reading any type of file that is not UTF-8 characters and interpreting them as UTF-8 will usually crash Python!

Not all byte sequences can be interpreted as strings.

Example: x =  b'\xf0\x9f\x90\x8d'

In [None]:
unicode_char = b'\xf0\x9f\x90\x8d'
print(f"This required 4 bytes! They are :",unicode_char)

# Convert the 4 bytes to binary
binary_char = '  '.join(['{0:08b}'.format(byte) for byte in unicode_char.encode()])

# Print the binary representation so we can see them
print(binary_char)

So what happens when you try to interpret this byte sequence as a string?  It crashes!

In [None]:
#Convert bytes into a string
unicode_char.decode()

There are lots of ways to interpret bytes as strings. For example, if we try to read a file off the disk that contains bytes but we try to read it as strings. In Python if I have bytes in a variable that I want to turn the contents of the variable into a string then you use the `.decode()` method shown above.

A common mistake for new python developers is to try and use str(). That doesn't work.


In [None]:
#Don't try to convert it this way.

print(str(unicode_char))

Strings are converted to bytes with `.encode()`

"String".encode()

Bytes are converted to string with `.decode()`

b"\x41\x41".decode()

How do I remember this?  Bytes have a "b" aka a backwards "d" in front of them.  To get rid of the backwards "d" I .decode()

Therefore we can not read .EXEs,.DLLs,.MOVs,.JPGs,.PDFs,.PCAPs or any other type of binary data as strings!

We need a different type of variable that doesn't try to do any interpretation on the bytes.  It should just treat each individual byte as a value between 0-255.

This new type of variable is called "bytes".

You create them by putting a lower case "b" outside of your string.

Typically bytes are comprised of hex values such as

`my_bytes = b'\xf0\x9f\x90\x8d'`

But python will also allow you to type characters instead of their hex values.

`my_bytes = b"HELLO"`

Internally these are not treated any differently.  They are still just stored as binary numbers associated with those characters.

When bytes are printed python will mix the hex output and the character output. If a byte value has a character associated with it then it prints the character, otherwise it prints the bytes value in hex.

In [None]:
#This will show you how all possible byte values are printed
 
all_bytes = bytes([x for x in range(256)])
print(all_bytes)

In [None]:
# Create a byte object with an initial value
x = b"Hello, World!"

# Decode bytes into a string
print("Decode bytes into a string:", x.decode())


In [None]:

# Encode string into bytes using ASCII encoding
s = "Hello, World!"
x = s.encode("ascii")
print("Encode string into bytes using ASCII encoding:", x)


In [None]:

# Get the length of the byte object
print("Get the length of the byte object:", len(x))


In [None]:

# Convert byte object to a list of integers
print("Convert byte object to a list of integers:", list(x))


In [None]:

# Get the index of the first occurrence of a byte in the byte object
print("Get the index of the first occurrence of a byte in the byte object:", x.index(b"o"))


In [None]:

# Count the number of occurrences of a byte in the byte object
print("Count the number of occurrences of a byte in the byte object:", x.count(b"o"))


In [None]:

# Replace all occurrences of a byte with another byte in the byte object
print("Replace all occurrences of a byte with another byte in the byte object:", x.replace(b"o", b"i"))


In [None]:

# Convert all bytes to lowercase
print("Convert all bytes to lowercase:", x.lower())


In [None]:

# Convert all bytes to uppercase
print("Convert all bytes to uppercase:", x.upper())


In [None]:

# Check if all bytes are alphanumeric
print("Check if all bytes are alphanumeric:", x.isalnum())

In [None]:

# Check if all bytes are alphabetical
print("Check if all bytes are alphabetical:", x.isalpha())

In [None]:

# Check if all bytes are digits
print("Check if all bytes are digits:", x.isdigit())

# Check if all bytes are printable characters
print("Check if all bytes are printable characters:", x.isprintable())

# Check if all bytes are whitespace characters
print("Check if all bytes are whitespace characters:", x.isspace())

# Return a new byte object with the bytes reversed
print("Return a new byte object with the bytes reversed:", x[::-1])


Just like strings none of these methods update the bytes stored in our variable.

In [None]:
#Create bytes
x = b'\xf0\x9f\x90\x8d'
#replace one of the bytes
x.replace(b"\x90",b"\x78")
#This did not update the value in `x` it produced a new set of bytes
print(x)

#To keep the change I have to produce a new set of bytes and reassign x.
x = x.replace(b"\x90",b"\x78")
print(x)

You also cant change parts of strings or bytes by reassigning parts of it.

In [None]:
#Create a string
x = "HOT PYTHON!"
#Retrieve the character in position 0
print("Retrieving character in position 0", x[0])
#Try to change the character in position 0 producing an error
x[0] = "N"

This is because strings and bytes are what we call "immutable" types. You can not change parts of the values, you can only produce new values and reassign the variables.

However, Python does have a third "string like" type called bytearrays. These are bytes that you can change value in the arrays. They are called "mutable" variables.

Mutable variables act very differently than immutable ones. We will explore this in more detail when we look at another mutable type called "lists" later.  For now, lets just see how to create and use byte arrays.

In [None]:
# Creating a bytearray and initializing with a string
my_str = "Hello, world!"
my_bytearray = bytearray(my_str, encoding='utf-8')
print("Bytearray created with a string:", my_bytearray)


In [None]:

# Creating a bytearray and initializing with a bytes
my_bytes = b"Hello, world!"
my_bytearray = bytearray(my_bytes)
print("Bytearray created with a bytes:", my_bytearray)

In [None]:

# Accessing an element in the bytearray
print("The element at index 5 is:", my_bytearray[5])

# Modifying an element in the bytearray
my_bytearray[5] = 100
print("The element at index 5 after modification is:", my_bytearray[5])


In [None]:

# Slicing the bytearray
print("The slice of the bytearray from index 2 to 6 is:", my_bytearray[2:6])

# Reversing the bytearray
my_bytearray.reverse()
print("The reversed bytearray is:", my_bytearray)

# Converting the bytearray to a list
my_list = list(my_bytearray)
print("The bytearray converted to a list is:", my_list)