# Python classes

Classes defining objects just like the base data types and structures in python. They can have attributes and functions that belong to them and create very powerful structures with their own built-in toolbox.

## Defining a class

Every class definition begins with `class`. Classes can be created from scratch or they can be built based on a previously existing class, allowing you to extend and customize it. If you don't provide a parent class type, then it defaults to the base `object` class, which is basically a bunch of empty built-in functions so you won't get errors, but it won't do anything without you defining stuff. We can also define attributes that every object created of that class type will contain.

In [2]:
# Define our first class and give it a doc string
class First():
    """My first class"""
    
a = First()
print(a.__doc__)

My first class


Every class has several specially-defined functions. These always have names beginning and ending with `__`, such as `__init__`. Some of the typical special names that you may care about are:

- \_\_init__
- \_\_len__
- \_\_str__
- \_\_doc__
- \_\_getitem__
- \_\_setitem__
- \_\_iter__
- \_\_next__
- \_\_del__


In [4]:
# Make a class that has a constructor, string, and delete method
class Second():
    """My second class"""
    
    def __init__(self, value):
        self.value = value
        
    def __str__(self):
        return "My value: {}".format(self.value)
    
    def __del__(self):
        print("Goodbye cruel world")
        
b = Second(5)
print(b)
del b

My value: 5
Goodbye cruel world


We can even a python data type, like a dictionary.

In [7]:
# Write our own dictionary class
class MyDict(dict):
    """My own dictionary"""
    counter = 0
    
    def __setitem__(self, key, value):
        setattr(self, key, value)
    
    def __getitem__(self, key):
        self.counter += 1
        if self.counter == 4:
            self.counter = 0
            return "Glitter!!!!!"
        else:
            return getattr(self, key)
        
c = MyDict()
c['a'] = 1
for i in range(8):
    print(c['a'])

1
1
1
Glitter!!!!!
1
1
1
Glitter!!!!!


Using classes means that we can store lots of types of information together and allow built-in functions to have access to them without having to keep track of lots of variables.

In [9]:
# Make a class to decribe a rectangle
class Rect(object):
    def __init__(self, height=1, width=1):
        self.height = height
        self.width = width
        
    def area(self):
        return self.height * self.width
        
    def __str__(self):
        return "Rectangle with sides of length {} and {}".format(self.height, self.width)
    
    def __len__(self):
        return 1
        
rect = Rect(2, 3)
print(len(rect))
print(rect)
print(rect.area())

1
Rectangle with sides of length 2 and 3
6


What about something more practical? Some people have encountered the problem (especially in a Jupyter notebook) that if you read through a file and without closing it and then try reading it again, it doesn't return any lines. Let's make a class to fix this behavior.

In [13]:
# File reader with automatic reset
class FileReader(object):
    def __init__(self, fname):
        self.fname = fname
        self.filehandle = open(fname, 'r')
        self.reset = False
        
    def __iter__(self):
        if self.reset:
            self.filehandle.seek(0)
            self.reset = False
        return self
    
    def __next__(self):
        line = self.filehandle.readline()
        if line == '':
            self.reset = True
            raise StopIteration
        return line
    
    def reset_file(self):
        self.reset = True
        
    def __del__(self):
        self.filehandle.close()
        
fr = FileReader('test.txt')
for i, line in enumerate(fr):
    print(line, end='')
    if i == 1:
        break
    
fr.reset_file()

for line in fr:
    print(line, end='')

del fr

one
two
one
two
three
four
five


This file reader class demonstrates a powerful aspect of classes which is being able to create your own iterator. What if we want something more specific? Yesterday you worked with a ctab data file which had multiple entries for genes, one per transcript. What if we want to be able to read out one gene at a time, including all of its transcripts?

In [12]:
# Ctab reader
class ctabReader(object):
    def __init__(self, fname):
        self.fname = fname
        self.fh = open(self.fname, 'r')
        self.reset_file = False
        self.skip_header()
        
    def __iter__(self):
        if self.reset_file:
            self.fh.seek(0)
            self.skip_header()
        return self
    
    def skip_header(self):
        self.fh.readline()
        self.line = self.fh.readline()
        
    def __next__(self):
        if not self.line:
            self.reset_file = True
            raise StopIteration
        fields = self.line.rstrip('\n\r').split('\t')
        gene = fields[-3]
        transcripts = [fields]
        line = self.fh.readline()
        fields = line.rstrip('\n\r').split('\t')
        while len(fields) > 2 and fields[-3] == gene:
            fields = line.rstrip('\n\r').split('\t')
            transcripts.append(fields)
            line = self.fh.readline()
        self.line = line
        return gene, transcripts
    
    def reset(self):
        self.reset_file = True
        
cr = ctabReader('../../../qbb2021/data/SRR072893.t_data.ctab')
print(next(cr))
print(next(cr))
cr.reset()

genes = {}
for gene, transcripts in cr:
    genes[gene] = transcripts

for i, pair in enumerate(genes.items()):
    if i > 5:
        break
    print(pair[0])
    for t in pair[1]:
        print("  ", t)


('CR41571', [['1', '3R', '-', '722370', '722621', 'FBtr0114258', '1', '252', 'FBgn0085804', 'CR41571', '0.000000', '0.000000']])
('CG45784', [['2', '3R', '+', '835381', '2503907', 'FBtr0346770', '20', '6213', 'FBgn0267431', 'CG45784', '0.000000', '0.000000']])
CR45220
   ['6', '3R', '-', '2744304', '2744800', 'FBtr0345282', '2', '438', 'FBgn0266747', 'CR45220', '0.000000', '0.000000']
   ['7', '3R', '-', '2744305', '2744800', 'FBtr0345281', '1', '496', 'FBgn0266747', 'CR45220', '0.231855', '0.284474']
   ['8', '3R', '+', '3322810', '3354486', 'FBtr0300207', '2', '1767', 'FBgn0086917', 'spok', '0.000000', '0.000000']
CR40182
   ['4', '3R', '-', '2156916', '2157206', 'FBtr0302347', '1', '291', 'FBgn0058182', 'CR40182', '14.020618', '17.202570']
Gfat1
   ['22', '3R', '+', '4126442', '4137882', 'FBtr0302850', '9', '2628', 'FBgn0027341', 'Gfat1', '0.000000', '0.000000']
   ['23', '3R', '+', '4130521', '4137882', 'FBtr0113687', '9', '2426', 'FBgn0027341', 'Gfat1', '0.000000', '0.000000']
   