# Data types in Python
<hr>

Python data types are the classification or categorization of data items. Different data types can be performed on different operations.

```{mermaid}
flowchart TD
    A[Python Data Types] --> B[Numeric]

    subgraph B [Numeric]
        B1[Integer]
        B2[Float]
    end
    
    A --> C[Dictionary]
    A --> D[Boolean]
    A --> E[Set]
    A --> F[Sequence]

    

    subgraph F [Sequence]
        F1[List]
        F2[Tuple]
        F3[String]
    end
    
```

## Numeric: integer and float
<hr>

In [1]:
a = 1
a

1

In [2]:
b = 2.2
b

2.2

Unlike some other programming languages, Python does not require explicit variable type declarations. It can automatically infer numeric types. For example, in the code above, since a has no decimal point, Python treats it as an integer (`int`), while b contains a decimal point, so Python interprets it as a floating-point number (`float`).

We can use the Python keyword `type` to get the type of a variable.

In [3]:
type(a)

int

In [4]:
type(b)

float

Type casting refers to converting one data type into another. Python provides several built-in functions to facilitate casting including int(), float() and str().

In [5]:
float(a)  # cast an integer variable to float

1.0

In [6]:
int(b)  # cast a float variable to integer

2

In [7]:
str(a) # cast a numeric to string

'1'

- When integers and floating-point numbers are mixed in an operation, the result will be automatically converted to a floating-point number.

## String
<hr>

A string is a sequence of characters. We can use either single quotes or double quotes to express a string.

In [8]:
name = 'chen'
name

'chen'

In [9]:
city = "London"
city

'London'

We can concatenate strings directly using the plus operator `+`:

In [10]:
name + city

'chenLondon'

Another common concatenation method is using the `join()` function: `str.join(sequence)`, which joins elements in the sequence with specified characters to produce a new string. For example:

In [11]:
",".join(["chen", "zhang", "li"])  # concatenating 3 strings with comma

'chen,zhang,li'

In [12]:
"-".join(["2020", "05", "13"])  # concatenating 3 strings with dash

'2020-05-13'

In [13]:
" ".join(["Brunel", "University"]) # concatenating 2 strings with space

'Brunel University'

If concatnating same strings, we can use the muliplication operator `*`：

In [14]:
name = "John"
name * 3

'JohnJohnJohn'

we can access individual elements of a **sequence** using indexing. The elements in a **sequence** are indexed starting from 0, and -1 from end.

- The first element is at position 0, the second at 1, and so on. Negative indices are also supported, where **-1 refers to the last element**, -2 to the second last, and so forth.

In [15]:
name[0]  # 0 is the first index

'J'

In [16]:
name[-1]

'n'

In [17]:
name[-2]

'h'

The syntax for slicing in a **sequence** is `sequence[start:end]`, where start is the starting index (**included**) and end is the stopping index (**excluded**).

In [18]:
name[0:3]  # get the first to the third characters

'Joh'

In [19]:
name[1:] # get all the characters from the frist index to end

'ohn'

To check whether a character or substring exists in a **sequence**, you can use the `in` or `not in` operators:


In [20]:
"c" not in name

True

In [21]:
"Bei" in city

False

Cast other data types to string can be done by using the function `str`:

In [22]:
str(123)

'123'

In [23]:
str([1, 2, 3])

'[1, 2, 3]'

Some widely used functions for string operations:

|function|meaning|
|:--|:--|
|str.capitalize()|Capitalize the first character in the string 'str'|
|str.upper()|Convert all the characters in the string 'str' to uppercase|
|str.lower()|Convert all the characters in the string 'str' to lowercase|
|len(str)|length of the **sequence** 'str'|
|str.isnumeric()|To check if a string 'str' contains only numeric characters|
|str.isdigit()|To check if a string 'str' contains only digits (0-9)|
|str.isalpha()|To check if all characters in a string 'str' are alphabetic (letters)|
|str.replace(old, new])|Return a copy of the string  'str' where the substring 'old' is replaced by the substring 'new' |
|str.strip([chars])|Return a copy of the string  'str' with leading and trailing characters removed specified in the 'chars' argument.|
|str.split(sep=None)|Returns a list of substrings by splitting the string 'str' of the specified separator 'sep'|

```{note}
There may be more arguments in those functions. Interested readers may refer to relevant resources for further details.
```

In [24]:
str = "python data science"
str.split()

['python', 'data', 'science']

In [25]:
str = "python Data Science"
str.lower()

'python data science'

In [26]:
str.capitalize()

'Python data science'

In [27]:
str1 = ' Brunel University London '
str1.strip(' ')

'Brunel University London'

In [28]:
str1.replace('University', 'School')

' Brunel School London '

## List
<hr>

A Python list is a **dynamic sized** array (automatically grows and shrinks). We can store **all types** of elements (including another list) in a list. List is shown by **square brackets**``[ ]``and elements are seperated by **commas**.

In [29]:
list1 = [34, 10, 25] # a list of numerics
list1

[34, 10, 25]

In [30]:
list2 = ["chen", "zhang", "wang"] # a list of strings
list2

['chen', 'zhang', 'wang']

In [31]:
list3 = [10, "wang", 33] # a list of mixed type of items
list3

[10, 'wang', 33]

- The indexing and slicing of a list is same with a string.

In [32]:
list2[1]  # the second element of list2

'zhang'

In [33]:
list1[0:2]  # the elements indexed by 0 to 1

[34, 10]

In [34]:
list1[0:]  # the elements indexed by 0 to the end

[34, 10, 25]

- Step size can ge given in slicing with the 3rd argument value

In [35]:
list3 = [4, 7, 8, 9, 10]
list3[1:4:2]  # the elements indexed by 1 to 3 with step size 2

[7, 9]

In [36]:
list2[-2]  # the second last element in list2

'zhang'

- Using two colons followed by -1 indicates reverse order

In [37]:
list1[::-1]  # using two colons followed by -1 indicates reverse order

[25, 10, 34]

To change the value of some elements in a **sequence**, we can diretly assign values using indexing.

In [38]:
list = [21, 16, 30]
list[1] = 10  # change the value of the second element in list
list

[21, 10, 30]

To delete some element in a **sequence**, we can use `del` along with indexing：

In [39]:
list = [34, 46, 23]
del list[1]
list

[34, 23]

For lists, the operators `+`, `*`, `in`, and `not in` function identically to their string counterparts. Below are examples:

In [40]:
list = [13, 23]
list = list + [21, 65]
list

[13, 23, 21, 65]

In [41]:
list * 2

[13, 23, 21, 65, 13, 23, 21, 65]

In [42]:
12 in list

False

In [43]:
12 not in list

True

- A 2D list is created by **nesting** one-dimensional lists within square brackets:

In [44]:
a = [[1, 2, 3], [4, 5]]
a

[[1, 2, 3], [4, 5]]

To visit the element in a 2D list, using two sets of squre brackets with index:

In [45]:
a[1][0]

4

- Adding elements into a list

We can use `append()`, `insert()`, and `extend()` to add elements in a list:

In [46]:
list = [34, 46, 23]
list.append(3)  # append one element at the end of a list
list

[34, 46, 23, 3]

In [47]:
list.insert(1, 10)  # insert an element 10 into the index 1 of the list
list

[34, 10, 46, 23, 3]

In [48]:
list.extend([4, 5, 6])  # extend the list with some other sequence elements in the end
list

[34, 10, 46, 23, 3, 4, 5, 6]

Some other functions：

|function name|meaning|
|:--|:--|
|max(list)|Return the maximum in a list 'list'|
|min(list)|Return the minimum in a list 'list'|
|list(sequence)|Transform another suquence 'sequence' to a list type|
|sequence.count(element)|Count the occurance of an element 'element' in a **sequence** 'sequence' |
|list.reverse()|Reverse the list 'list'|
|list.pop(index)|Remove one element in a list with given index 'index'|
|list.remove(element)|Remove all elments in a list 'list' whose contents are same as element 'element' |
|list.clear()|Clear all the elements in a list 'list'|

In [49]:
a = [1, 2, 3]
a.pop(-1)
print(a)

[1, 2]


In [50]:
a.reverse()
a

[2, 1]

In [51]:
str = 'chench'
str.count('c')


2

## Dictionary
<hr>

字典是另一种可变容器模型，字典的格式为``{key : value}``，整个字典包括在花括号``{ }``中。**键（key）必须是唯一且不可变**，但值（value）可以取任何数据类型，也可以改变，key，value 之间用冒号``：``分割。


In [52]:
dict = {"name": "chen", "score": 95}
dict

{'name': 'chen', 'score': 95}

访问字典里的值时，则把相应的键放入到方括号``[ ]``中：

In [53]:
dict["name"]

'chen'

In [54]:
dict["score"]

95

向字典增加一项，可以直接用方括号`[ ]`增加元素：

In [55]:
dict["major"] = "economy"
dict

{'name': 'chen', 'score': 95, 'major': 'economy'}

修改字典里的值时，则把相应的键放入到方括号中，赋值进行修改即可：

In [56]:
dict["name"] = "wang"
dict["mark"] = 80
dict

{'name': 'wang', 'score': 95, 'major': 'economy', 'mark': 80}

删除字典里的某个键值时，用 del：

In [57]:
del dict["name"]
dict

{'score': 95, 'major': 'economy', 'mark': 80}

注意：键必须不可变，所以可以用数字，字符串或元组充当，但用列表就不行。字典类型数据的常用函数还有：

<table>
    <tr style="border-top:solid">
        <td style="text-align:left">len(dict)</td>
        <td style="text-align:left">返回字典 dict 中的元素个数，即键或值的个数</td>
    </tr>
    <tr>
        <td style="text-align:left">clear()</td>
        <td style="text-align:left">删除字典 dict 中的所有元素</td>
    </tr>
    <tr>
        <td style="text-align:left">get(key, default=None)</td>
        <td style="text-align:left">返回指定键的值，如果键不在字典中返回 default 设置的默认值</td>
    </tr>
    <tr>
        <td style="text-align:left">values()</td>
        <td style="text-align:left">返回一个迭代器，可以使用 list() 来转换为列表</td>
    </tr>
    <tr>
        <td style="text-align:left">keys()</td>
        <td style="text-align:left">返回一个迭代器，可以使用 list() 来转换为列表</td>
    </tr>
    <tr style="border-bottom:solid">
        <td style="text-align:left">popitem()</td>
        <td style="text-align:left">删除字典的最后一项，并返回这个删除的项</td>
    </tr>
</table>

与列表、字符串或者元组类型一样，字典可以通过`for-in`语句对其元素进行遍历，返回的是字典的 key。

In [58]:
for i in dict:
    print(i)

score
major
mark


若要输出 value，则可以用`get`函数或者用方括号`[ ]`里面跟 key。

In [59]:
for i in dict:
    print(dict[i])

95
economy
80


## 元祖类型
<hr>

Python 的元组（tuple）与列表类似，不同之处在于**元组的元素不能修改**。元组使用**小括号**创建，并使用逗号隔开。访问列表元素或者截取部分元组与列表非常类似：

In [60]:
tup = (13, "zh", 20)
tup[1]

'zh'

In [61]:
tup[0:2]

(13, 'zh')

In [62]:
tup[0:]

(13, 'zh', 20)

元组中的元素不能修改，但可以用 del 删除整个元组， 也可以用运算符 +、*、in、not in：

In [63]:
tup = (13, "zh", 20)
del tup  # 完全删除 tup

In [64]:
tup = (23, 45, 21)
tup = tup + (32, 21)
tup

(23, 45, 21, 32, 21)

In [65]:
tup * 2

(23, 45, 21, 32, 21, 23, 45, 21, 32, 21)

In [66]:
25 in tup

False

In [67]:
25 not in tup

True

<table>
    <tr style="border-top:solid">
        <td style="text-align:left">max(tup)</td>
        <td style="text-align:left">返回元组 tup 中的最大值元素</td>
    </tr>
    <tr>
        <td style="text-align:left">min(tup)</td>
        <td style="text-align:left">返回元组 tup 中的最小值元素</td>
    </tr>
    <tr>
        <td style="text-align:left">tuple(seq)</td>
        <td style="text-align:left">将列表转化为元组</td>
    </tr>
    <tr>
        <td style="text-align:left">len(tup)</td>
        <td style="text-align:left">返回元组 tup 的长度，即元素个数</td>
    </tr>
    <tr style="border-bottom:solid">
        <td style="text-align:left">tup.count(obj)</td>
        <td style="text-align:left">统计某个元素在元组 tup 中出现的次数</td>
    </tr>
</table>

## 集合类型*
<hr>

集合（set）是一个无序的**不重复**元素序列，一般使用大括号 { } 创建。注意：创建一个空集合必须用 set() 而不是 { }，因为 { } 默认创建一个空字典）。

In [68]:
set1 = {34, 23, "chen"}
set1

{23, 34, 'chen'}

In [69]:
set2 = {12, "34", 10}
set2

{10, 12, '34'}

集合间的常用运算符包括：

<table>
    <tr style="border-top:solid">
        <td style="text-align:left">-</td>
        <td style="text-align:left">删除左边集合包含，而右边集合不包含的元素</td>
    </tr>
    <tr>
        <td style="text-align:left">|</td>
        <td>两个集合的并集</td>
    </tr>
    <tr>
        <td style="text-align:left">&</td>
        <td>两个集合的交集</td>
    </tr>
    <tr style="border-bottom:solid">
        <td style="text-align:left">^</td>
        <td>不同时包含于两个集合中的元素</td>
    </tr>
</table>

In [70]:
set1 = {34, 12, "chen"}
set2 = {12, "wang", 10}
set1 - set2

{34, 'chen'}

In [71]:
set1 | set2

{10, 12, 34, 'chen', 'wang'}

In [72]:
set1 & set2

{12}

In [73]:
set1 ^ set2

{10, 34, 'chen', 'wang'}

在集合中添加元素可以用``add``函数，添加多个元素（可以是列表、元组或字典）用``update``函数，删除某个元素用``remove``函数， 判断某个元素是否在集合中，用``in``或``not in``：

In [74]:
set = {13, 45, 67}
set.add(10)
set

{10, 13, 45, 67}

In [75]:
set.remove(10)
set

{13, 45, 67}

In [76]:
set.update([80, 44])
set

{13, 44, 45, 67, 80}

In [77]:
44 in set

True

In [78]:
44 not in set

False

<table>
    <tr style="border-top:solid">
        <td style="text-align:left">len(set)</td>
        <td style="text-align:left">返回集合 set 中的元素个数</td>
    </tr>
    <tr>
        <td style="text-align:left">set.add()</td>
        <td style="text-align:left">向集合 set 中添加一个元素</td>
    </tr>
    <tr>
        <td style="text-align:left">set.remove()</td>
        <td style="text-align:left">集合中移除一个元素</td>
    </tr>
    <tr>
        <td style="text-align:left">set.update(seq)</td>
        <td style="text-align:left">添加多个元素，可以是列表、元组或字典</td>
    </tr>
    <tr>
        <td style="text-align:left">set1.issubset(set2)</td>
        <td style="text-align:left">判断集合 set1 是否为另一个集合 set2 的子集</td>
    </tr>
    <tr>
        <td style="text-align:left">set1.issuperset(set2)</td>
        <td style="text-align:left">判断集合 set1 是否为另一个集合 set2 的父集</td>
    </tr>
    <tr>
        <td style="text-align:left">set1.isdisjoint(set2)</td>
        <td style="text-align:left">判断两个集合是否包含相同的元素，如果没有返回 True</td>
    </tr>
    <tr style="border-bottom:solid">
        <td style="text-align:left">set1.union(set2)</td>
        <td style="text-align:left">返回两个集合的并集</td>
    </tr>
</table>

## 布尔类型
<hr>

Python 中的布尔值（Bool) 一般通过逻辑判断产生，只有两个可能的结果：``True``或``False``。

In [79]:
10 > 3

True

In [80]:
3 == 4

False

在做逻辑判断时，两个等号``==``表示是否相等，若仅有一个等号，则表示赋值。

In [81]:
a = 3  # 一个等号表示将 a 赋值为 3
a == 4  # 两个等号表示判断 a 是否等于 4

False

对多个逻辑判断的运算，即 “且”，”或”，“非”， python 分别提供了``and``, ``or``, ``not``：

In [82]:
10 > 3 and 3 > 2

True

In [83]:
10 > 3 and 3 > 4

False

In [84]:
10 > 3 or 3 > 4

True

In [85]:
not 3 > 4

True

## `random` 模块
<hr>

随机数在编程中应用比较普遍，Python 内置的`random`模块可以生成常见的伪随机数（计算机生成的随机数都是伪随机数，真随机数是不能人为产生的）。

`random`中的常用函数：

|函数|含义|
|:--:|:--|
|seed(a=None)|初始化随机数种子，默认值为当前系统时间|
|random()|生成一个 [0.0, 1.0] 之间的小数|
|randint(a, b)|生成一个 [a, b] 之间的随机整数|
|uniform(a, b)| 生成一个 [a, b] 之间的随机小数|
|shuffle(seq)| 将序列类型中的元素打乱，返回打乱后的序列|
|sample(pop, k)| 从 pop 中随机选取 k 个元素，以列表形式返回|


随机数的种子可以通过函数`seed`指定，只要种子相同，每次生成的随机数也相同。

In [86]:
import random

random.random()  # 没有设置种子，每次运行显示的随机数不一样

0.831045124371628

In [87]:
random.seed(100)
random.random()  # 设置了种子，每次运行显示的随机数相同

0.1456692551041303

## 练习
<hr>

```{exercise}
:label: circle-area
根据给定的半径值，计算圆的面积
```

````{solution} circle-area
:class: dropdown
```{code-block} python
import math

r = 10
area = math.pi * r**2
print("圆的面积为 %.6f" % area)
```
````

```{exercise}
:label: random
从字符串 'abcedefg' 随机挑取 3 个字符。（提示，使用 random 库中的函数）

```

````{solution} random
:class: dropdown
```{code-block} python
import random

random.sample('abcedefg', 3)
```
````

```{exercise}
:label: quad-equation
通过用户输入数字 a, b, c，计算二次方程: $ax^2 + bx +c$。
(提示：cmath 工具包可以计算负数的开方)
```

````{solution} quad-equation
:class: dropdown
```{code-block} python
# 导入 cmath(复杂数学运算) 模块
import cmath

a = float(input("输入 a: "))
b = float(input("输入 b: "))
c = float(input("输入 c: "))

# 计算
d = (b**2) - (4 * a * c)

# 两种求解方式
sol1 = (-b - cmath.sqrt(d)) / (2 * a)
sol2 = (-b + cmath.sqrt(d)) / (2 * a)

print("结果为 {0} 和 {1}".format(sol1, sol2))
```
````

```{exercise}
:label: triangle-area
通过用户输入三角形三边长度，并计算三角形的面积。
```

````{solution} triangle-area
:class: dropdown
```{code-block} python
# 导入 cmath(复杂数学运算) 模块
a = float(input("输入三角形第一边长: "))
b = float(input("输入三角形第二边长: "))
c = float(input("输入三角形第三边长: "))

# 计算半周长
s = (a + b + c) / 2

# 计算面积
area = (s * (s - a) * (s - b) * (s - c)) ** 0.5
print("三角形面积为 %0.2f" % area)
```
````

```{exercise}
:label: dict-values
定义一个字典，然后计算它们所有数字值的和。。
```

````{solution} dict-values
:class: dropdown
```{code-block} python
def dictSum(myDict):      
    sum = 0
    for i in myDict: 
        sum = sum + myDict[i]      
    return sum
  
dict = {'a': 100, 'b':200, 'c':300} 
print("Sum :", returnSum(dict))
```

<script src="https://giscus.app/client.js"
        data-repo="robinchen121/book-Python-Data-Science"
        data-repo-id="R_kgDOKFdyOw"
        data-category="Announcements"
        data-category-id="DIC_kwDOKFdyO84CgWHi"
        data-mapping="pathname"
        data-strict="0"
        data-reactions-enabled="1"
        data-emit-metadata="0"
        data-input-position="bottom"
        data-theme="light"
        data-lang="en"
        crossorigin="anonymous"
        async>
</script>

<!-- Toogle google translation -->
<div id="google_translate_element"></div>
<script type="text/javascript">
      function googleTranslateElementInit() {
        new google.translate.TranslateElement({ pageLanguage: 'zh-CN',
                  includedLanguages: 'en,zh-CN,zh-TW,ja,ko,de,ru,fr,es,it,pt,hi,ar,fa',
layout: google.translate.TranslateElement.InlineLayout.SIMPLE }, 'google_translate_element');
      }
</script>
<script type="text/javascript"
      src="https://translate.google.com/translate_a/element.js?cb=googleTranslateElementInit"
></script>
<br>