# Data Types and Structures
#### Author: [Yuxiao (Rain) Luo](https://github.com/YuxiaoLuo)

- Data types are classifications associated with specific data that let the computer know how to interpret the value and how a programmer intends to use a piece of data. 

- Data types are common in many programming languages, and each data type has its own unique properties.

## Here is a list of common data types in Python:

1. Numeric Types:
    - `int`: integer, whole numbers (e.g., `1`, `2`, `-3`)
    - `float`: decimal numbers (e.g., `3.14`, `-2.7`)
    - `complex`: complex numbers (e.g., `1 + 2j`)
2. Sequence Types:
    - `str`: string, a sequence of characters (e.g., `"Hello, World!"`)
    - `list`: an ordered, mutable collection of items (e.g., `[1, 2, "three"]`)
    - `tuple`: an ordered, immutable collection of items (e.g., `(1, 2, "three")`)
3. Mapping Type:
    - `dict`: dictionary, a collection of key-value pairs (e.g., `{"name": "Rain", "age": 30}`)
4. Set Types:
    - `set`: an unordered collection of unique items (e.g., `{1, 2, 3}`)
    - `frozenset`: an unordered, immutable collection of unique items (e.g., frozenset`({1, 2, 3})`)
5. Boolean Type:
    - `bool`: boolean, a logical value (`True` or `False`)
6. None Type:
    - `None`: the absence of a value (i.e., not available, N/A, NA)
    

*These are the rules you need to memorize, like you need to memorize the chords and notes when learning how to play a guitar.*

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/YuxiaoLuo/AI_Intro/blob/main/python_type_structure.ipynb)

In [62]:
# display multiple outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

### Check data type
`type()` is a handy function that returns the data type of any Python object.

In [12]:
# int
print(type(1))
# str
print(type("Hello, World!"))
# float
print(type(1.3))

<class 'int'>
<class 'str'>
<class 'float'>


In [13]:
type(1812312.1)

float

Notice the difference between a string `True` and a boolean `True`.

In [11]:
# first True is str
print(type('True'))

# second True is bool
print(type(True))

<class 'str'>
<class 'bool'>


### Assign value to variable
In Python, the data type is set when you assign a value to a varaible

In [22]:
# str
x = "CIS2350"
print(f"The value of variable 'x' is {x} and the data type is {type(x)}")

# int
x = 20
print(f"The value of variable 'x' is {x} and the data type is {type(x)}")

# float
x = 20.3434
print(f"The value of variable 'x' is {x} and the data type is {type(x)}")

# None
x = None
print(f"The value of variable 'x' is {x} and the data type is {type(x)}")

# bool
x = True
print(f"The value of variable 'x' is {x} and the data type is {type(x)}")

# dict
x = {"course" : "cis2350", "instructor" : "Dr. Luo"}
print(f"The value of variable 'x' is {x} and the data type is {type(x)}")

The value of variable 'x' is CIS2350 and the data type is <class 'str'>
The value of variable 'x' is 20 and the data type is <class 'int'>
The value of variable 'x' is 20.3434 and the data type is <class 'float'>
The value of variable 'x' is None and the data type is <class 'NoneType'>
The value of variable 'x' is True and the data type is <class 'bool'>
The value of variable 'x' is {'course': 'cis2350', 'instructor': 'Dr. Luo'} and the data type is <class 'dict'>


### Specify data type for varaible
If you want to specify the data type, you can use the following constructor functions. You may also find that these functions are spelled the same as the names of data types.

In [30]:
x = int(20)
print(f"The value of variable 'x' is {x} and the data type is {type(x)}")

x = str("CIS course")
print(f"The value of variable 'x' is '{x}' and the data type is {type(x)}")

x = list(("apple", "banana", "cherry"))
print(f"The value of variable 'x' is '{x}' and the data type is {type(x)}")

The value of variable 'x' is 20 and the data type is <class 'int'>
The value of variable 'x' is 'CIS course' and the data type is <class 'str'>
The value of variable 'x' is '['apple', 'banana', 'cherry']' and the data type is <class 'list'>


These functions will convert whatever values to the format coercively, even the value doesn't look right.

In [35]:
# 20.5 should be a floating point value rather than an integer by nature
# int() will convert it to 20 to make it an integer
x = int(20.5)
print(f"The value of variable 'x' is {x} and the data type is {type(x)}")

# 5 should be an integer rather than a boolean by nature
# bool() will convert it to 'True' to make it a boolean value
x = bool(5)
print(f"The value of variable 'x' is '{x}' and the data type is {type(x)}")

The value of variable 'x' is 20 and the data type is <class 'int'>
The value of variable 'x' is 'True' and the data type is <class 'bool'>


### Variable names and keywords

1. Programmers generally choose names for their variables that are meaningful and document what the variable is used for.

2. Variable names can be arbitrarily long. They can contain both letters and numbers, but they cannot start with a number. 

3. It is legal to use uppercase letters, but it is a good idea to begin variable names with a lowercase letter (you’ll see why later).

3. The underscore character ( _ ) can appear in a name. It is often used in names with multiple words, such as my_name or airspeed_of_unladen_swallow. 

4. Variable names can start with an underscore character, but we generally avoid doing this unless we are writing library code for others to use.

`2350cis` is illegal because it begins with a number. `cis@` is illegal because it contains an illegal character, `@`. But what’s wrong with `if`? Because `if` is one of Python’s keywords. 

The interpreter uses keywords to recognize the structure of the program, and they cannot be used as variable names.

Python reserves 35 keywords:



```
False      await      else       import     pass
None       break      except     in         raise
True       class      finally    is         return
and        continue   for        lambda     try
as         def        from       nonlocal   while
assert     del        global     not        with
async      elif       if         or         yield
```

*Avoid using these names for your variables*

## Data structures

There are 4 main data structures in Python:
- `list`
- `set`
- `tuple`
- `dict` 

Data structures are “containers” that organize and group data according to type. The data structures differ based on mutability and order.

### `list` operations
1. Create list: `[]`
2. Add items to the end of list
3. remove last item from list

In [5]:
#creating a empty list and adding numbers
empty_list = []
print(empty_list)

[]


In [2]:
# List with items
numbers = [1, 2, 3, 4, 5]
print("List of Numbers:", numbers)

List of Numbers: [1, 2, 3, 4, 5]


In [3]:
# List with mixed data types
mixed_list = [1, "Hello", 3.14, True]
print("Mixed List:", mixed_list)

Mixed List: [1, 'Hello', 3.14, True]


In [4]:
# list.append(x) :add an item to the end of the list
numbers.append(100) 
print(numbers)

mixed_list.append("Dr. Luo!")
print(mixed_list)

[1, 2, 3, 4, 5, 100]
[1, 'Hello', 3.14, True, 'Dr. Luo!']


In [5]:
# list.pop(): removes and returns the last item in the list
mixed_list.pop()
print(mixed_list)

[1, 'Hello', 3.14, True]


4. Accessing Items in a List

In [6]:
# Accessing items by index (indexing starts at 0 in Python)
print("First item in numbers:", numbers[0])
print("Second item in mixed_list:", mixed_list[1])

First item in numbers: 1
Second item in mixed_list: Hello


5. Slicing a List

In [7]:
# Slicing to get a subset of items
print("First three items in numbers:", numbers[:3])

First three items in numbers: [1, 2, 3]


In [8]:
print("Items from index 2 to the end:", numbers[2:])

Items from index 2 to the end: [3, 4, 5, 100]


6. Modifying a List

In [9]:
print(numbers)

[1, 2, 3, 4, 5, 100]


In [10]:
# Changing an item by index
numbers[0] = 10
print("Modified numbers list:", numbers)

Modified numbers list: [10, 2, 3, 4, 5, 100]


#### `list` methods (remove, insert, locate position of each component, extend list)
- retrieve item index
    - `list.index()` return index of the first element equal to the argument

- add items to list 
    - `list.append(x)` add an item to the end of the list
    - `list.insert(i, x)` insert an item x at a given position i
    - `list.extend(iterable)` extend the list by appending all the items from the iterable
- remove items
    - `list.pop()` removes and returns the last item in the list
    - `del list[i]` delete an item at a give position i
    - `list.remove(x)` remove the first item from the list whose value is equal to x

7. Adding Items to a List

In [11]:
# Adding a single item
numbers.append(6)
print("List after appending 6:", numbers)

List after appending 6: [10, 2, 3, 4, 5, 100, 6]


In [12]:
# Adding multiple items
numbers.extend([7, 8, 9])
print("List after extending with [7, 8, 9]:", numbers)

List after extending with [7, 8, 9]: [10, 2, 3, 4, 5, 100, 6, 7, 8, 9]


In [13]:
# Inserting an item at a specific index
numbers.insert(1, 15)
print("List after inserting 15 at index 1:", numbers)

List after inserting 15 at index 1: [10, 15, 2, 3, 4, 5, 100, 6, 7, 8, 9]


8. Removing Items from a List

In [14]:
# Removing an item by value
numbers.remove(15)
print("List after removing 15:", numbers)

List after removing 15: [10, 2, 3, 4, 5, 100, 6, 7, 8, 9]


In [16]:
# Removing an item by index
removed_item = numbers.pop(2)
print("List after popping item at index 2:", numbers)
print("Removed item:", removed_item)

List after popping item at index 2: [10, 2, 4, 5, 100, 6, 7, 8, 9]
Removed item: 3


In [17]:
# Clearing a List
numbers.clear()
print("List after clearing:", numbers)

List after clearing: []


9. Common List Methods


In [18]:
# Using the `len()` function to get the length of a list
length = len(mixed_list)
print("Length of mixed_list:", length)

Length of mixed_list: 4


In [20]:
print(mixed_list)

[1, 'Hello', 3.14, True]


In [19]:
# Using `in` to check for membership
is_hello_in_list = "Hello" in mixed_list
print("Is 'Hello' in mixed_list?:", is_hello_in_list)

Is 'Hello' in mixed_list?: True


In [21]:
# Sorting a List (with numbers)
numbers = [5, 2, 8, 3, 1]
numbers.sort()
print("Sorted numbers:", numbers)

Sorted numbers: [1, 2, 3, 5, 8]


In [22]:
# Sorting in descending order
numbers.sort(reverse=True)
print("Numbers sorted in descending order:", numbers)

Numbers sorted in descending order: [8, 5, 3, 2, 1]


10. Iterating Over a List & List Comprehension

In [24]:
# Using a `for` loop
for item in mixed_list:
    print("Item:", item)

Item: 1
Item: Hello
Item: 3.14
Item: True


In [25]:
# Creating a new list with squares of numbers from 0 to 9
squares = [x**2 for x in range(10)]
print("Squares:", squares)

Squares: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


11. Nested Lists

In [27]:
# A list can contain other lists
nested_list = [[1, 2], [3, 4], [5, 6]]
print("Nested List:", nested_list)

Nested List: [[1, 2], [3, 4], [5, 6]]


In [28]:
# Accessing items in a nested list
print("First list in nested_list:", nested_list[0])
print("Second item in the first list:", nested_list[0][1])

First list in nested_list: [1, 2]
Second item in the first list: 2


#### List comprehension

List comprehension is a shorter syntax for `for` loop.

1. Let's create a new list based on the values of an existing list.
    - *output_list = [item for item in existing_list if (condition to be satisfied)]*

In [1]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [2]:
newlist = []
for x in range(10):
    newlist.append(x**2)
    print(newlist)

[0]
[0, 1]
[0, 1, 4]
[0, 1, 4, 9]
[0, 1, 4, 9, 16]
[0, 1, 4, 9, 16, 25]
[0, 1, 4, 9, 16, 25, 36]
[0, 1, 4, 9, 16, 25, 36, 49]
[0, 1, 4, 9, 16, 25, 36, 49, 64]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


In [3]:
newlist2 = [x**2 for x in range(10)]
newlist2

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

2. Let's create a new list based on the values that satisfy a condition in an existing list.
    - output_list = [item for item in existing_list if (condition to be satisfied) ]

In [4]:
fruits = ["Apple", "banana", "cherry", "kiwi", "mango", "Avocado", "blackberry"]

newlist3 = [f for f in fruits if 'a' in f or 'A' in f]

print(newlist3)

['Apple', 'banana', 'mango', 'Avocado', 'blackberry']


### `tuple` operations
- similar to lists but are immutable, meaning that elements cannot be added or removed after creation
    - create `tuple` using `()`
- when to use tuple?
    - Choose tuple if elements don't change
- heterogeneous Data:
    - Tuples can contain elements of different data types, such as integers, strings, and even other tuples.

In [5]:
# create tuple using ()
tup = (0,1,2,3,4)
print(tup)

(0, 1, 2, 3, 4)


In [6]:
# locate elements in duple
tup[2]

2

In [8]:
# add elements in tuple
tup + ("hello", 10, True)

(0, 1, 2, 3, 4, 'hello', 10, True)

### Sets
- also similar to lists, but they store a unique and unordered collection of items, with no duplicates
- set will automatically reorder the number and delete duplicates
- use `{}`

In [9]:
#create an empty set
s1 = set()
print(s1)

set()


In [55]:
#set will automatically reorder the number and delete duplicates
s2 = {3,3,5,7,8,2,10} 
s3 = {'A','a','b',3,5}

print(s2)
print(s3)

{2, 3, 5, 7, 8, 10}
{'A', 3, 5, 'b', 'a'}


In [56]:
# set.add()
s2.add(1)
print(s2)

{1, 2, 3, 5, 7, 8, 10}


In [57]:
# set.remove() removes an item from a set. 
# Will raise a KeyError exception if the specified item is not found in the set.
s2.remove(7)
print(s2)

{1, 2, 3, 5, 8, 10}


In [58]:
print(s2)

{1, 2, 3, 5, 8, 10}


In [59]:
# set.pop() removes a random item from the set
# This method returns the removed item.
print(s2.pop())
print(s2)

print(s2.pop())
print(s2)

1
{2, 3, 5, 8, 10}
2
{3, 5, 8, 10}


In [64]:
print(s2, s3)

{3, 5, 8, 10} {'A', 3, 5, 'b', 'a'}


In [68]:
#set operations
s2&s3           # find intersection between two sets
s2|s3           # union two sets
s2-s3           # find elements in set2 but not in set 3
s2^s3           # find elements in set2 or set3 but not both

{3, 5}

{10, 3, 5, 8, 'A', 'a', 'b'}

{8, 10}

{10, 8, 'A', 'a', 'b'}

In [71]:
(s2|s3) - (s2&s3) # union - intersection

{10, 8, 'A', 'a', 'b'}

#### Set comprehension

Similar to list comprehension.

In [80]:
#Create a set with unique characters other than 'abc' in the following string 
check_char = 'abracadabra'

#{char for char in check_char if char not in 'abc'}
{char for char in check_char if char not in ('a','b','c')}

{'d', 'r'}

### Dictionary

Use `{}` to create key-value pairs.

- Dictionaries store a mapping between a series of unique keys and values
- Dictionaries are indexed by immutable keys (strings, numbers, tuples)

In [84]:
d = {}
d = {"one": 1, 
     "two": 2, 
     "three": 3}

print(d)

{'one': 1, 'two': 2, 'three': 3}


In [83]:
type(d)

dict

In [85]:
#create a dictionary with dict()
a = [('a',1),('b',2),('c',3)]

d1 = dict(a)
print(d1)

{'a': 1, 'b': 2, 'c': 3}


In [87]:
# separate keys and values with method values() and keys()
list(d.keys())
list(d.values())

['one', 'two', 'three']

[1, 2, 3]

In [88]:
# get a specific value from a dictionary, with method get()
d.get('one')

1

In [91]:
# update and remove dictionary element with method update() and del statement
d.update({'four':4,'five':5})
print(d)

{'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5}


In [94]:
print(d)

{'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5}


In [96]:
#check whether a key is in a dictionary
'one' in d

#check whether a value is in a dictionary
1 in d.values()

True

True

#### Dict comprehensions 

It can be used to create dictionaries from arbitrary key and value expressions

In [99]:
#create a list with squares of integers less than 10
{x: x**2 for x in range(10)} 

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

## Magic Commands

Magic methods are special commands that provide additional functionality beyond standard Python syntax. 

There are two types of magic methods in Jupyter notebook: **line magics** and **cell magics**. 
- Line magics apply to the current line and start with `%` 
- while cell magics apply to the entire cell and start with `%%`

Find the full list of Magic Commands and descriptions here: https://ipython.readthedocs.io/en/stable/interactive/magics.html

- `%lsmagic`: Lists all available magic commands
- `%magic`: return full list magic commands with explanations

### List variables in the environment
- `%who`: Lists all variables in the current namespace.
- `%whos`: Provides a detailed list of variables, including their types and values.

In [21]:
%who

s1	 s2	 s3	 tup	 


In [22]:
%whos

Variable   Type     Data/Info
-----------------------------
s1         set      set()
s2         set      {2, 3, 5, 7, 8, 10}
s3         set      {'A', 3, 5, 'b', 'a'}
tup        tuple    n=5


## Reference
1. https://www.py4e.com/
2. https://gcdf-cuny.gitbook.io/data-analytics-in-digital-research-with-r/1.-r-basics/data-types
3. https://www.w3schools.com/python/python_datatypes.asp
4. https://github.com/YuxiaoLuo/Analytics_Python/blob/main/Python_Review/PythonReview_1.ipynb