# BF-931 TÓPICOS AVANÇADOS EM MICOLOGIA I - Introdução ao Biopython

## Programa de Pós-Graduação em Biologia de Fungos - [PPGBF](https://www.ufpe.br/ppgbf)

### [**Dra. Angelina de Meiras-Ottoni**](http://lattes.cnpq.br/5692217174749691)

**LinkedIn:** [Angelina Meiras Ottoni](https://www.linkedin.com/in/angelina-meiras-ottoni/)

**GitHub:** [AngelOttoni](https://github.com/AngelOttoni)

[Source repository](https://github.com/AngelOttoni/learning-biopython)

# **Python libraries**

- In Python, libraries  are collections of pre-written code and functionalities that extend the language's capabilities beyond its core features.

- These libraries are developed to address specific tasks and domains, such as data manipulation, scientific computing, machine learning, web development, and more.

- They are created to save developers time and effort by providing ready-to-use functions and tools, making it easier to implement complex tasks without having to write everything from scratch.

- Python's extensive library ecosystem is one of its major strengths, as it allows developers to leverage existing solutions to common problems rather than reinventing the wheel.

- By importing and using these libraries in their code, developers can access a wide range of functionalities, making their programs more powerful, efficient, and feature-rich.

Libraries in Python can be distributed in different ways:

1. **[Standard Library:](https://docs.python.org/3/library/)** Python comes with a comprehensive standard library, which includes modules for handling various tasks, such as file I/O, regular expressions, networking, data structures, and more.

  - These modules are available by default and do not require additional installation.

2. **Third-Party Libraries:** Apart from the standard library, there are numerous third-party libraries developed by the Python community and other organizations.
  - These libraries cover diverse areas like data analysis, web development, artificial intelligence, game development, and more.
  - To use these libraries, developers need to install them using package managers like [pip](https://pip.pypa.io/en/stable/) [(Python Package Installer)](https://packaging.python.org/en/latest/tutorials/installing-packages/).

**Here's an example of importing and using a library in Python:**

In [None]:
# Importing the NumPy library
import numpy as np

# Creating a NumPy array
my_array = np.array([1, 2, 3, 4, 5])

# Performing a calculation with NumPy
result = np.sum(my_array)

# Printing the result
print(result)

- In this example, we imported the [NumPy library](https://numpy.org/) using the `import` statement and used its functions and capabilities to create an array and calculate its sum efficiently.

- Overall, libraries in Python play a vital role in expanding the language's functionality, making it a versatile and powerful choice for a wide range of applications.

- Developers can explore and leverage different libraries based on their project requirements to streamline development and enhance their code's capabilities.

# **Biopython**



---

- [Biopython](https://biopython.org/) is a powerful and popular open-source Python library designed to facilitate bioinformatics and computational biology tasks.

- It provides a wide range of functionalities for handling biological data, including DNA, RNA, protein sequences, and various biological file formats.

- Biopython is a valuable resource for researchers and bioinformaticians to analyze, manipulate, and process biological data efficiently.

## **History**



---



1. **Early Development (1999-2000):**
   Biopython originated in 1999 when **Jeff Chang**, a graduate student at the University of California, San Diego, started the project as a collection of Python modules to work with biological data. Initially, it was a personal project for Jeff's research needs.

2. **Project Expansion (2000-2006):**
   As the project gained popularity, Jeff decided to **release Biopython under an open-source license in 2000**, making it freely available for the bioinformatics community.
   
    - Over the next few years, the community started to grow, and more contributors joined the project.
    - Biopython expanded its scope to encompass a broader range of bioinformatics tasks.

3. **Biopython Tutorial (2003):**
   **Brad Chapman**, one of the contributors, created the **first Biopython tutorial in 2003**, which made it easier for users to learn and get started with the library.

4. **Stable Releases (2004-2010):**
   The Biopython project started to achieve stability with regular releases, making it more reliable for research and analysis.
   
    - The library incorporated functionalities like sequence handling, file parsing, sequence analysis, and phylogenetics.

5. **Integration of BioSQL (2005-2006):**
   Biopython **integrated with BioSQL**, a database schema designed for storing and querying biological data.

6. **Continued Growth (2010-2015):**
   Biopython continued to grow, attracting more contributors from various backgrounds, including academia, industry, and the **open-source** community.

7. **Python 3 Support (2015-2016):**
   With the increasing importance of Python 3, Biopython underwent significant changes to ensure compatibility and support for Python 3, along with maintaining support for Python 2.

8. **Modernization and PEP 8 Compliance (2017-2018):**
   Biopython focused on code modernization and improved adherence to the PEP 8 style guide, making the codebase cleaner and easier to maintain.

9. **Beyond the 1.70 Release (2019-2021):**
   The Biopython project continued to progress, with efforts on performance optimization, bug fixes, and new feature additions. The community remained active, supporting users and encouraging contributions.

**Here's a brief introduction to some of the key features and functionalities of Biopython:**


---



1. **Sequence Handling:**
   Biopython allows you to work with biological sequences, such as DNA, RNA, and protein sequences.
    - It provides data structures to store and manipulate these sequences, as well as methods for accessing individual elements, calculating sequence statistics, and performing sequence alignments.

2. **File Parsing:**
   The library supports reading and writing various biological file formats, such as FASTA, GenBank, PDB (Protein Data Bank), Clustal, and many more.
    - This capability is essential for loading data from external sources and saving results in a standardized format.

3. **Sequence Analysis:**
   Biopython offers a wide range of tools for sequence analysis, including functions for sequence similarity searches, motif finding, ORF (Open Reading Frame) prediction, translation, reverse complement, and more.
   
    - It also supports various algorithms for sequence alignment, such as BLAST and ClustalW.

4. **Phylogenetics:**
   Biopython enables you to perform phylogenetic analyses, such as building phylogenetic trees, calculating evolutionary distances, and conducting bootstrap analysis.
   
    - It supports popular tools like Bio.Phylo for phylogenetic workflows.

5. **Structural Biology:**
   The library provides functionalities to work with 3D protein structures, including parsing PDB files, analyzing protein structure properties, and performing structural alignments.

6. **Bioinformatics Utilities:**
   Biopython includes several utilities for handling biological data, such as tools for handling codon tables, calculating molecular weights, and handling genetic codes.

**To start using Biopython, you first need to [install](https://biopython.org/wiki/Download) it using `pip`:**

In [None]:
!pip install biopython

- Once installed, you can import the Biopython modules in your Python script and begin exploring the library's capabilities.

In [None]:
# Remenber: upload file

In [None]:
from Bio import SeqIO

# Example: Reading a FASTA file
filename = "falename.fasta"
for record in SeqIO.parse(filename, "fasta"):
    print("ID:", record.id)
    print("Sequence:", record.seq)

## **Some necessary clarifications:**


---



1. **Module:** A module is indeed a set of functions, types, classes, or variables grouped together in a common namespace.

  - It is a single file containing Python code and can be imported and used in other Python programs.

2. **Library:** A library is a collection of modules (and sometimes additional resources) that are designed to be used in various programs.

  - It typically contains reusable code that provides specific functionalities. Libraries are not necessarily confined to Python; they can be written in other programming languages as well.

3. **Package:** A package is a way of organizing related modules and sub-packages in a directory hierarchy.

  - It is essentially a directory that contains a special file called `__init__.py`, which indicates that the directory is a package.
  
  - A package can include multiple modules and sub-packages, making it a higher-level organizational concept in Python.

**So, to summarize:**


- A **module** is a single file containing Python code.

- A **library** is a collection of modules (and potentially additional resources) that offer specific functionality and can be used in various programs.

- A **package** is a directory that contains one or more modules and sub-packages, indicated by the presence of the `__init__.py` file, used for organizing related functionality.

- **Regarding the term** "*Package*: Distribution unit that can contain a library, an executable, or both," this definition might be referring to a specific type of package in the context of packaging and distribution, which is not the same as the Python package discussed above.

- In the context of packaging, a package could refer to a distribution unit that contains the necessary files to distribute and install a library or an executable.

- This distribution unit may include metadata, configuration files, documentation, and code, allowing users to easily install and use the software on their systems.

In [None]:
print("The end")

### Reference links:


---

[Python](https://www.python.org/)

[Bio package](https://biopython.org/docs/1.75/api/Bio.html)

[Biopython](https://biopython.org/)

[Open Bioinformatics Foundation](https://www.open-bio.org/)

[Biostars](https://www.biostars.org/)

[Stack overflow Bioinfo](https://stackoverflow.com/questions/tagged/biopython)