
### Data types in Python

#### There are 14 data types in Python

##### Fundamental Types
1. Integer - Number
2. Float   - Number
3. Complex - Number
4. Boolean - True / False - Special type
5. None    - Special Type
6. String(single character)  - Text char including special symbols. (Part 1)

##### Derived Data Types
6. String sequence - Sequence of string characters is technically a derived datatype. Derived from the fundamental datatype string (Part 2). However, Python treats this also as a fundamental datatype that is immutable and there is absolutely no difference to the operations and handling of strings in Python whether they are of a single character or multiple characters 'stringed' together to form a sequence of characters except indexing, slicing and iteration can be performed over stringed characters but obviously not on a single string character.

7. List - A sequential datatype. Made of sequence of other objects. Mutable. Indexing and slicing work.
8. Tuple - A sequential datatype. Made of sequence of other objects. Immutable object in itself. But objects (elements) inside tuple may be mutable. Indexing and slicing work.
9. Set - Mutable object. But objects inside set must be immutable. An unordered datatype. Therefore indexing and slicing is not supported.
10. Dictionary - A mutable datatype. Keys of a Dictionary must be immutable objects. But since a dictionary itself is mutable, the keys may be changed(i.e. keys may be deleted and/or new keys added). Values may be mutable objects. Indexing is not supported.
11. Frozen Sets - Immutable set
12. Range - 
13. Bytes
14. Bytearrays

##### There is no concept of fixing sizes of datatypes / objects in Python. Python handles this internally. Some other languages like C++ require explicitly defining size of the object. 

#### Below - we shall take a brief overview of each datatype. However, there are lot of methods and unique features of Strings, Lists, Tuples, Dictionaries and Sets(and Frozen Sets). We will talk about each of these datatypes in detail in subsequent classes. 


###### Today, I wish to focus on the difference between the fundamental data types and derived datatypes. 

###### I shall briefly talk about two examples (Strings and Lists) of what we mean by sequential datatype.

###### A short explanation of nesting

###### I wish to move on to indexing, slicing and iteration over sequential datatypes.

###### Then we shall talk about the concept of Object Re-usability and finally Mutability and Immutability.

###### I shall skip the tuple, dictionary, sets and frozen sets datatypes for now and move on to the range, byte and bytearray datatypes.

###### Strings, Lists, Tuples, Dictionaries, Sets alongwith Frozen sets - each of them will require a lengthy period of time for each datatype to discuss the initialisation, methods available and unique attributes. So, we shall take a deep dive into them one by one.

###### In the end, will provide a chart showing typecasting between datatypes. Please go ahead and try the type casting in the chart by yourselves in your notebooks and compare the results to the expected results. If result is unexpected, try and analyse why. We can discuss in following class if any questions about this.


In [1]:
#1. Integers

# Any whole number including 0 and negative numbers. E.g.
# Python automatically allocates the size required to store integers in Python - 8bits, 16bits, 32bits and so on.

print(type(0))
print(type(7832903))
print(type(-88))
print(type(-23018413))

<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>


In [2]:
# getsizeof() method returns the size of the memory allocated to an object in Python. 
# It is a system-specific method. Which means - that on different OS, it may throw different results.
# We need to import sys module to use it.
# It returns the size of the instance object it is called on - including overhead like reference count and type information
# etc.

import sys
print(sys.getsizeof(1))
print(sys.getsizeof(23018413))

28
28


In [3]:
#2. Float

# Any real number with decimal point
# Can only be represented in decimal in Python not in binary, octal or hexadecimal
# No concept of double precision in Python
# Takes the form of MSB(Sign)+Exponential+Mantissa in binary(64-bits) 

print(type(10.2))
print(type(0.132022))
print(type(.132022))

print(sys.getsizeof(0.132022))
print(0.13022.__sizeof__())



<class 'float'>
<class 'float'>
<class 'float'>
24
24


In [4]:
#3. Complex numbers

# Takes the shape of a + b*j (or J) where j/J represents square root of -1
# a is a real number
# b is a real number multipled with imaginary number j making it an imaginary number
# a can be decimal/binary/octal or hexadecimal
# b can only be real
# Cannot perform modulo or floor division on complex numbers but other arithmetic operations can be performed
# Mainly used in mathematical applications, engineering etceteras.
d = 11 + 5j
b = 0b1011 + 5j
o = 0o13 + 5j
x = 0xB + 5j

print(d,b,o,x, sep = '\n')

y = 11 + 0b101
print(y)

print(type(y))
# z = 11 + 0b101J
# print(z)

(11+5j)
(11+5j)
(11+5j)
(11+5j)
16
<class 'int'>


In [5]:
#4. Boolean Datatype

# Can only hold values of True and False. Always equated to 0 is false and all else is true or 1. Empty string is False
# and all else is true or 1.

# However, while EVALUATING 100 (though true) will NOT be equal to 1 or True

print(100==200)
print(100==100)

False
True


In [6]:
a = 1
b = 0
c = 100

if a == True:
    print('Hello. I could print because the if condition is True')
    
if b == False:
    print('I could also print because the if condition is True')

Hello. I could print because the if condition is True
I could also print because the if condition is True


In [7]:
a = 1
b = 0
c = 100

if a == True:
    print('Hello. I could print because the if condition is True')
    
if b == False:
    print('I could also print because the if condition is True')
    
if c:
    print('c evaluates to True so I could print because the if condition is True')

Hello. I could print because the if condition is True
I could also print because the if condition is True
c evaluates to True so I could print because the if condition is True


In [8]:
if True:
    print('Hello. I could print because the if condition is True')
    
if False:
    print('Hi there! I wont be print because the if condition is False')

Hello. I could print because the if condition is True


In [9]:
if 100:
    print('Hello. I could print because the if condition is True')

Hello. I could print because the if condition is True


In [10]:
if 0:
    print('Hi there! I wont be print because the if condition is False')

In [11]:
if 'Python':
    print('Hello. I could print because the if condition is True')

if '':
    print('Hi there! I wont be print because the if condition is False')

Hello. I could print because the if condition is True


In [12]:
#5. None datatype

a = None
print(a)
print(type(a))

None
<class 'NoneType'>


In [89]:
a==10
b=a
print(b)
print(b==None)


Python
False


In [13]:
b = a==10
print(b)
print(b==None)

False
False


In [14]:
b = ''
c = 0

print("'' = None is", b == None)
print("0 = None is", c == None)

'' = None is False
0 = None is False


NameError: name 'hh' is not defined

In [90]:
x = 3
y = 5

def temp_func():
    print('****')
    z = x + y
    #return temp_func()
    
print('Result of temp_func is', temp_func())

****
Result of temp_func is None


In [None]:
def temp_func():
    print('****')
    z = x + y
    return 'This is the return from the temporary function', z

print(temp_func())

In [17]:
#6. Strings

#Strings are text or characters they are usually enclosed in single ('Python'), double ("Python") or triple ('''Python''')
#quotation marks

print(type('Python'))
print(type("Python"))
print(type('''Python'''))

<class 'str'>
<class 'str'>
<class 'str'>


#### Before taking you any futher in the datatypes, I wish to digress into memory management and indexing, slicing and iteration in Python.

#### Then we shall discuss the idea of Object-Re-usability. 

#### Then we shall take a look at another simple datatype - Lists

#### And finally, we can explain mutability and immutability in Python.

In [18]:
### Memory Management in python

## On start up Python initialises
#1. All alphabets in upper and lower case
#2. Numbers between -5 to 256
#3. Boolean literals - True, False
#4. None Datatype

# These are the most used objects and this initialisation is done to reduce overhead. Every time one of these characters
# is used, Python points to the same memory location. 

### Note that this range is typical for CPython i.e. the C implementation of Python. CPython is the most common 
### implementation of Python. However, other implementations such as JPython, IronPython etc. may have a different range.

x = 'a'
y = 'a'

print(x is y)
print(id(x), id(y))

True
2740308033456 2740308033456


In [19]:
a = 'Python'
b = 'Python'
print(a is b)
print(id(a), id(b))

True
2740308175536 2740308175536


In [20]:
a = '#'
b = '#'

a = 0
print(type(a))


# print(a is b)
# print(id(a), id(b))

<class 'int'>


In [21]:
x = 0
y = 0

print(x is y)
print(id(x), id(y))

True
140737028761344 140737028761344


In [22]:
x = 1
y = 1

print(x is y)
print(id(x), id(y))

True
140737028761376 140737028761376


In [23]:
x = -1
y = -1

print(x is y)
print(id(x), id(y))

True
140737028761312 140737028761312


In [24]:
x = 100
y = 100

print(x is y)
print(id(x), id(y))

True
140737028764544 140737028764544


In [25]:
x = 256
y = 256

print(x is y)
print(id(x), id(y))

True
140737028769536 140737028769536


In [26]:
x = 257
y = 257

print(x is y)
print(id(x), id(y))

False
2740375533328 2740375533904


In [27]:
x = -5
y = -5

print(x is y)
print(id(x), id(y))

True
140737028761184 140737028761184


In [28]:
x = -6
y = -6

print(x is y)
print(id(x), id(y))

False
2740375533904 2740375532944


In [29]:
### For the most part, the above range holds true for CPython. However, IDEs and systems also have cache memory and have
### their own set of rules to optimize memory. It is not suprising to see objects outside this range sometimes having the
### same ID especially if a block of code has been run repeatedly.

x = 10000
y = 10000
z = x

print(x is y)
print(id(x), id(y))
# print(z is x)

### Lets see this in another IDE 'Thonny'


#XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

False
2740375534096 2740375534160


In [30]:
v = None
w = None

print(v is w)

True


In [31]:
v = True
w = True

print(v is w)

True


In [32]:
v = False
w = False

print(v is w)

True


In [33]:
# Indexing in Python

abc = 'Python'
print(abc[0])
print(abc[5])

print(abc[0:5])
print(abc[-6:-2])

P
n
Pytho
Pyth


In [34]:
abc[-1]
abc[-6]

'P'

In [35]:
xyz = [1,2,3,4,5,6]

print(type(xyz))

<class 'list'>


In [36]:
xyz = [1,2,3,4,5,6]
print(xyz[0])
print(xyz[5])

1
6


In [37]:
print(xyz[-1])
print(xyz[-6])

6
1


In [38]:
## Nested Indexing

nest_lst = [1,2*100,['a', 2, 'Hello'], ['Nested', 'List'], 1000]

print(nest_lst[0])
print(nest_lst[1])
print(nest_lst[-1])

1
200
1000


In [39]:
print(nest_lst[2][2])
print(nest_lst[2][1])
print(nest_lst[3][0])

Hello
2
Nested


In [70]:
## Object reutilisation in Python

a = 'Python'
bt = 'python'
b = 'P'
c = 'y'
d = 't'
e = 'h'
f = 'o'
g = 'n'

# print(a[0], a[0] is b)
# print(id(a[0]), id(b))

# print(id(a))

print(id(a[1]), id(bt[1]), id(c))
print(a[1] is bt[1])
print(a[1] is c)
print(bt[1] is c)
# print(a[1], a[1] is c)
# print(a[2], a[2] is d)
# print(a[3], a[3] is e)
# print(a[4], a[4] is f)
# print(a[5], a[5] is g)

2740306594096 2740306594096 2740306594096
True
True
True


In [41]:
a = [1,2,3,4,5,6]
b = 1
c = 2
d = 3
e = 4
f = 5
g = 6

print(a[0], a[0] is b)
print(a[1], a[1] is c)
print(a[2], a[2] is d)
print(a[3], a[3] is e)
print(a[4], a[4] is f)
print(a[5], a[5] is g)

1 True
2 True
3 True
4 True
5 True
6 True


In [42]:
a = [1000,2000,3000,4000,5000,6000]
b = 1000
c = 2000
d = 3000
e = 4000
f = 5000
g = 6000

print(a[0], a[0] is b)
print(a[1], a[1] is c)
print(a[2], a[2] is d)
print(a[3], a[3] is e)
print(a[4], a[4] is f)
print(a[5], a[5] is g)

1000 False
2000 False
3000 False
4000 False
5000 False
6000 False


In [43]:
b = 1
c = 2
d = 3
e = 4
f = 5
g = 6

a = [b,c,d,e,f,g]

print(a[0], a[0] is b)
print(a[1], a[1] is c)
print(a[2], a[2] is d)
print(a[3], a[3] is e)
print(a[4], a[4] is f)
print(a[5], a[5] is g)

1 True
2 True
3 True
4 True
5 True
6 True


### Indexing and Slicing(Part 2)

In [None]:
## Syntax for slicing

In [48]:
## Slicing in Python takes 3 parameters

# [Start index : Stop Index : Step Size] enclosed in Square Brackets next to the object that supports indexing. 

abc = 'Python'

print(abc[0:6:1])

Python


In [45]:
print(abc[0:5:1])

Pytho


In [46]:
print(abc[1:4:1])

yth


In [51]:
print(abc[0:6:3])

Ph


In [52]:
# All three parameters are optional i.e. if not provided the defaults will be taken. 

# Defaults are:

# Start index - defaults to 0
# Stop index - defaults to end of indexable object
# Step size - defaults to 1

# However, if defaults are to be used just include a colon inside square brackets so - [:] 

print(abc[:])

Python


In [55]:
# We can also do negative slicing i.e. slice backwards or from the end.

print(abc[::-2])

nhy


In [54]:
print(abc[1:4:-1])




In [56]:
print(abc[4:1:-1])

oht


In [57]:
# Lists

x = [1,2,3,4,5]

print(x[0])
print(x[-1])
print(x[::-1])

1
5
[5, 4, 3, 2, 1]


In [None]:
y = []
print(y)

In [58]:
z = (1,2,3,4,5)
print(type(z))

<class 'tuple'>


In [59]:
y = list(z)
print(y)

[1, 2, 3, 4, 5]


In [60]:
y = list((1,2,3,4,5))
print(y)

[1, 2, 3, 4, 5]


In [None]:
# A list is a datatype in Python that holds a collection of elements. 
# It can hold different datatypes. 
# It is mutable. 
# It is considered a 'Sequence'. The reason it is considered a sequence is that it is ordered and elements can be accessed
# through the index position. holds other memory objects (or more specifically, a pointer to the location where the object
# is actually stored.) Slicing, Indexing and iteration can be performed over sequential datatypes .

# We can perform iteration on a list:

for a in x:
    print(f'7 times {a} is {a * 7}.')

In [None]:
    
#Nesting - Lists, Tuples and Dictionaries(only for their values) support nesting. 

lst11 = [1,2,3, [4,5,6]]

for a in lst11:
    print(a)

In [None]:
# To access a nested list(or other iterable) by index number, simply call the index positions one after the other, 
# Like looking for a street number and then the house number. 

print(lst11[3][2])

lst111 = [1,[2,'a', 'KKR', 'RCB'], 3, [4,5,6]]

print(lst111[1][2])

In [None]:
# lst22 = [1,2,3,[4,5,6]]

# print(lst22)

# for a in lst22:
#     if hasattr(a, '__iter__'):
#         print(f'{a} is iterable and has the following values')
#         for b in a:
#             print(b)
#     else:
#         print(f'{a} is not iterable')

In [None]:
# Python also has an array module for creating arrays. An array can only hold one datatype and that has to be specified
# while creating the array. An array is mutable i.e. indexing, slicing, iteration, addition/deletion of elements, 
# concatenation etc - all these can be performed. 

import array as arr

new_arr = arr.array('i', [1,2,3,4,5])

print(new_arr)
print(new_arr[0])
print(type(new_arr))

#Note arr is just an alias (nickname) for the module we are importing so it is easier & faster to type. Once we alias
# the import, all the functions in the module will work exactly as originally intended.

In [None]:
# Arrays are fast to process since they contain only datatype and are an important datatype in the Data Science journey. 
# However, Numpy library is mostly used for arrays in Data Science and we shall cover the Numpy library in detail going
# forward.

# Difference between Array and List - An array can only hold one datatype. Lists can hold multiple datatypes. They can hold
# other sequential datatypes.

x = [1,'a', None, True, 3+2j, 10.2, ['P', 'for', 'pajamas']]

for a in x:
    print(a)

In [None]:
for a in x[-1]:
    print(a)

# for a in x[0]:
#     print(a)

In [None]:
#### Mutability and immutability

# Fundamental datatypes in Python are immutable. You can think of an immutable datatype as one that cannot be broken down
# further without losing its meaning.

# String fundamental datatype is peculiar in Python because though it is sequential and supports indexing, slicing and
# iteration, it is immutable. 

# String Character (NOTE - I am clearly not saying a string of characters but a single character)

# If I type the letter P, can I in any way change it to another letter without changing what it represents?
# The number 645, can I in any way change it to another number without changing what it represents?
# Complex number 3+5j, if I change this to 2 + 4j - does this not change what it represents?
# Changing True to False in Boolean datatype obviously changes its meaning
# None datatype to any other datatype makes it lose its meaning. 

#However a sequential derived datatype, may or may not be mutatble based on how it is defined in Python and its uses.
# A list of students in class [Adithya, Subham, Vaishnavi, Venu, Venkat, Varun]
# I could change one of them it still remains the list of students in class but only the names inside have changed.

# One may argue that a 3-digit number is sequential. No it is not. Please remember that 645 is basically 6 x 100 + 4 x 10 
# + 5 x 1. Without treating this as a single unit - the computer would have no way of understanding its actual value of 645.

# Now that we can agree that fundamental datatypes in Python are immutable, let us understand how Python handles fundamental
# datatypes and Derived datatypes. Note, for a datatype to be mutable it must be derived datatype. We are hearing the
# word mutable so much # What IS IT??

In [None]:
a = 1000
b = 'Python'

lst1 = [a, b]

print(a is lst1[0])
print(b is lst1[1])
print(id(a), id(lst1[0]))
print(id(b), id(lst1[1]))

#So, as we see, lst1 is actually holding the objects which were in variables a and b. Specifically, it is actually holding
#only the pointers to where these objects are located in the memory.

In [None]:
lst1 = [1, 'P']

a = 1
b = 'P'
print(1 is lst1[0])
print('P' is lst1[1])

In [None]:
print(a is lst1[0])
print(b is lst1[1])

In [None]:
print(id(1), id(a), id(lst1[0]))
print(id('P'), id(b), id(lst1[1]))

In [None]:
lst1 = [1000, 'Python']
a = 1000
b = 'Python'
print(1000 is lst1[0])
print('Python' is lst1[1])

In [None]:
print(a is lst1[0])
print(b is lst1[1])

In [None]:
print(id(1000), id(a), id(lst1[0]))
print(id('Python'), id(b), id(lst1[1]))

In [None]:
lst1 = [a, b]

print(a is lst1[0])
print(b is lst1[1])

In [None]:
print(id(a), id(lst1[0]))
print(id(b), id(lst1[1]))

In [None]:
# We can access the element at a certain index location in the list and alter it.

c = 2000

lst1[1] = c

print(lst1)

In [None]:
# We can extend the list by using the append function i.e. add another element to it.

lst1.append(b)
print(lst1)

In [None]:
# We can iterate over it.

for a in lst1:
    print(a)
    
for a in range(len(lst1)):
    print(lst1[a])

In [None]:
# All the above prove that a list is a MUTABLE datatype in Python. A list is NOT a fundamental datatype. It is a derived
# data type. It is sequential i.e. it doesnt contain the objects themselves but the memory locations of the objects
# that it contains. Therefore we can perform slicing, indexing and iteration over it. 

# But that does not mean that a datatype that is sequential, one that we CAN slice, index or iterate over is necessarily a
# mutable type. In derived datatypes - tuples and frozen sets are immutable - even though they are sequential. A range
# datatype # is a peculiar datatype in that it is a function AND a datatype. 

# A dictionary datatype is ordered starting in Python 3.7 - therefore sequential. Yet, it does not support indexing or slicing. 
# The KEYS of dictionaries are immutable, the values may or may not be mutable and the dictionary ITSELF is mutable. Though a tuple itself is immutable, it can hold
# mutable datatypes and therefore the elements inside a mutable element inside a tuple can be mutated.

# The above 5 lines are bound to make anyones head spin. But once we dig deeper into each of these datatypes, it will not
# be a headscratcher any more.

In [None]:
# But now we come to the mother of all mutable/immutable exceptions:

## The string of length more than 1

a = 'Python'
print(a[0] is 'P')
print(a[1] is 'y')
print(a[2] is 't')
print(a[3] is 'h')
print(a[4] is 'o')
print(a[5] is 'n')

In [None]:
for x in a:
    print(x)
    
ida = id(a)
print(ida)

In [None]:
# As we can see, a string of more than one characters is sequential i.e. contains the memory location of other objects
# of which it comprises. It can be indexed, sliced and iterated over. However, a string is STILL immutable in Python i.e. 
# even if I wished to change just one letter of a string text (i) Strings do not support item/index assignment i.e. changing
# of characters inside the string and (ii) Reassigning a new string value to the variable will store the new object in a 
# different memory location.

a[0] = 'p'

print(a)

print(id(a))

In [None]:
a = 'python'

idaa = id(a)
print(id(a))

In [None]:
# We can get the values at a specific memory location in Python using the ctypes module

import ctypes

old_id_val = ctypes.cast(ida, ctypes.py_object).value
new_id_val = ctypes.cast(idaa, ctypes.py_object).value

print(f'Comparing values at specific memory location : {old_id_val}, {new_id_val}.')

In [None]:
# To delete an object, use the del() function

del(a) 

print(a)

### Now that we understand object re-usability, sequential datatypes and mutability and immutability we can return to the other datatypes in Python.

In [None]:
#Range Datatype (and is also a built-in function in Python!)

a = range(0,10,1)
print(a)
print(type(a))

In [None]:
b = [0,1,2,3,4,5,6,7,8,9]

for x in a:
    print(x, id(x) == id(b[x]))

In [None]:
#Range can take in 3 parameters

#1. Start parameter - Optional. If not specified defaults to 0.
#2. End Parameter - Mandatory
#3. Step - Optional. If not specified defaults to 1

## Please remember though - Start Parameter and Step are optional and have default values. BUT - you can never use the step
# value if you do not specify the start value i.e. you can have the following combinations:

# 1. (Only stop value)
# 2. (Start value, Stop value)
# 3. (Start value, Stop value, Step value)

# But cannot have:

# (Stop value, step value) without Start value. So, stop value is required whether or not start and step are specified. 

# Start value can be specified without step value (but with stop value). However, step value cannot be specified if stop
# value or stop value is not specified.

b = range(0,10,1)

for x in b:
    print(x)

In [None]:
b = range(0,10)

for x in b:
    print(x)

In [None]:
b = range(10)

for x in b:
    print(x)

In [None]:
b = range(0,10,2)

for x in b:
    print(x)

In [None]:
    
b = range(10,2)

for x in b:
    print(x)

In [None]:
c = range(1,10,1)

for x in c:
    print(x)

In [None]:
c = range(1,10,2)

for x in c:
    print(x)

In [None]:
#Using range in reverse order

b = range(10,0,-1)

for x in b:
    print(x)

In [None]:
#Using range to generate negative numbers

b = range(0,-10,-1)

for x in b:
    print(x)

In [None]:
y = 3
print(f'When slicing b we get back a range object {b[-2:-5]}')
print(f'Object at index {y} of b is {b[y]}')

In [None]:
### Bytes datatype

# Bytes datatype in Python is an immutable array of bytes. 
# Each element of the array can take values between 0 to 255.
# A bytes datatype is iterable.

# 8 bits make up a byte in binary. Maximum value of 8 bits is 255. 
# Fortunately, hexadecimal also has a maximum value of 255 but using only 2 digits. 

bint = bin(255)
hexd = hex(255)

print(bint, hexd, sep = '\n')

In [None]:
### The bytes function in Python returns an immutable byte-array object representing the original object. It returns a 
# new datatype(object).

# It can be used on either numbers or on strings. 
# It takes 3 parameters - 

#Source - The source object to be converted
#Encoding - (In case of strings) whether to use ASCII, UTF-8 or UTF-16
#Error - In case of an error while handling a string input, how to handle the error

In [None]:
### Bytes function with no arguments

x = bytes()
print(x)

# # This returns a byte-array of no size. Note the output below with the prefix b and then empty string. 

In [None]:
# ### Bytes function with one argument

x = bytes(5)
print(x)

# # This returns a byte-array of specified size all initialised to 0 value in hexadecimal

In [None]:
for a in x:
    print(a)

In [None]:
    
### Bytes function with an iterable object (or string character, more on that later)

x = bytes([1,2,7,79])
print(x)

In [None]:
for a in x:
    print(a)

In [None]:
## Bytes function with a value higher than 255 (i.e.8-bits)

x = bytes([1,2,255,7])
print(x)

In [None]:
## Bytes object to binary

for a in x:
    print(bin(a))


In [None]:
#### Bytes-array with string

### ASCII (128 characters) and Unicode, cp1252 are different encoding protocols. 
### Within Unicode - UTF-8 (112,064 characters), UTF16 and UTF32 are different encoding systems depending on the bits sizes being used.

### Each character in each of these encoding systems is assigned a unique bit value. When converting a string to bytes, 
### each character in the string gets assigned that value(depending on the encoding system specified)


### Note that in Python the default encoding is UTF-8. 

### However, the default encoding of the built-in function open() <which we will see while doing file handling>
### depends on the platform and can be checked by 

# For windows platform

import locale
locale.getpreferredencoding()

In [None]:
# ### Bytes array with string, without specifying encoding will throw an error

x = 'Python'

print(bytes(x))

In [None]:
print(bytes(x,'utf-8'))
print(bytes(x,'ascii'))
print(bytes(x,'cp1252'))

In [None]:
## Note how the output of the string encoding is still text? Internally, Python has converted each of these characters to
## their corresponding values of the respective systems. But that is not human-readable and therefore Python converts this
## to human-readable format i.e. text prefixed by b <b'text'> before printing the output.

In [None]:
### We can also specify how to handle errors in the bytes function when encoding strings.

#Ignore - Ignores unencodable character and returns the rest of the string encoded
p = 'ⱣytꞪon'

print(bytes(p,'ascii',errors='ignore'))
print(bytes(p,'cp1252',errors='ignore'))

In [None]:
print(bytes(p,'utf-8',errors='ignore'))

# Take a look at the last bit of output for the utf-8 encoding output

b'\xe2\xb1\xa3yt\xea\x9e\xaaon'

# In the UTF-8 encoding the first section:

# \xe2\xb1\xa3 is our special character Ᵽ

# 2nd section:

# yt are our ASCII encodable characters and human-readable characters 'yt'

# 3rd section:

# \xea\x9e\xaa is our special character Ɦ

# 4th section:

# on are again ASCII encodable

# As mentioned before UTF-8 has 100,000 plus characters encodable.

In [None]:
# Replace - Replace the unencodable character to ? along with the rest of the string encoded.

print(bytes(p,'ascii',errors='replace'))
print(bytes(p,'cp1252',errors='replace'))

print(bytes(p,'utf-8',errors='replace')) # Note - nothing was replaced here.

In [None]:
#Strict - Throws an error if it finds an un-encodable character.

print(bytes(p,'ascii',errors='strict'))

In [None]:
print(bytes(p,'cp1252',errors='strict'))

In [None]:
print(bytes(p,'utf-8',errors='strict'))

In [None]:
pyt = 'Pythhhhhhon'
z = bytes(pyt,'ascii',errors='strict')
print(z)

for i in z:
    print(i)

In [None]:
# A bytes object is immutable. It cannot be changed.

z[0] = hex(112)

In [None]:
# Since it is an array - all functions used on arrays can be used here. 

print(len(z))
print(z.count(b'h'))

In [None]:
#### Bytearray is another datatype in Python for dealing with Bytes. However, bytearray object IS mutable. 

x = 'Python'

z = bytearray(x, 'ascii')
print(z)

In [None]:
for i in z:
    print(i)

In [None]:
z[0] = 112
print(z)

In [None]:
z[0] = 65
print(z)

In [None]:
print(z[:2])

# All other operations, parameters for bytearray remain exactly the same as for bytes. 