# A Primer on Basic Python Syntax

Python's syntax is designed for readability and efficiency, making it an ideal language for scientific computing in fields like biochemistry. Here's an overview of some basic syntax rules and concepts:

## Importing Libraries
- `Libraries` are collections of prepackaged code suited to specific tasks. Python is `modular`, allowing you to choose the only the functionality you need.
- Use the `import` statement to include external libraries, essential for scientific computing.
- Example libraries: `numpy` for numerical operations, `matplotlib.pyplot` for plotting.

## Variables and Printing
- Assign values to variables without declaring their type, thanks to Python's dynamic typing.
- `Variables` can store different data types such as float numbers, integers, strings, and more complex data structures such as dictionaries.
- `Variables` can start with a '_' or a letter. They cannot start with a number.
- Use the `print()` function for outputting data. This is the simplest way to test your code.

## Data Structures
### Lists
- Create lists using square brackets and comma separated items `[1, 2, 3]`.
- Store ordered collections of items, like amino acids `[Arg, Lys, Cys]`.
- Iterate over lists with `for` loops to perform operations on each element.

### Dictionaries
- Use curly braces `{}` to define `dictionaries`.
- Store data as key-value pairs, ideal for mapping relationships, like amino acids to their properties `{A:1, B:2, C:3}`.
- Access values by referencing their keys.

## Functions
- Define functions using the `def` keyword.
- Encapsulate reusable code, like calculating molecular weight of an input DNA sequence.
- Return values using the `return` statement.

## NumPy Arrays
- Utilize `NumPy arrays` for efficient numerical computations.
- They behave like matrices.
- Perform operations on many values at once, or single values.
- Arrays are useful for handling large datasets.
- Arrays are also a useful way to encode images in Python.

## Plotting with Matplotlib
- Use Matplotlib for data visualization.
- Create plots, histograms, scatter plots, etc., to visually represent data.
- Another plotting library worth noting is called `seaborn`, we'll talk more about that later.

## Advanced Data Structures
- Work with complex data structures, like nested dictionaries, for more sophisticated data modeling.
- Store and access detailed information, such as the properties of various amino acids.

## Before you move on..
- In the following examples and exercises, the goal is to analyze the syntax rules as you go, rather than to explicitly lay out all the rules.
- Don't aim to `learn python`, aim to `use python`, the learning will come with time
- The primers in this series are meant to be short and dense, and encourage practical use of the contepts over mastering all of their possible uses
- Focus on learning the commonalities and structure of programming over the details. When you get stuck, use foundation AI models such as [ChatGPT](https://chat.openai.com/) to assist you (but be critical of the results and test everything).
- Finally, all Python libraries host documenation, for example: [numpy](https://numpy.org/doc/), [matplotlib](https://matplotlib.org/stable/index.html). They can also be helpful, especially for looking up arguments for specific functions.


## Code Examples

In [None]:
# Importing necessary libraries for the following examples
import numpy as np
import matplotlib.pyplot as plt

# Basic Python Syntax: Variables and Printing
# -------------------------------------------------
# Defining a variable to store a DNA sequence
dna_sequence = "AGCTCGTACGATCG"
# Printing the DNA sequence
print("DNA Sequence:", dna_sequence)

# Data Structures: Lists and For Loops
# -------------------------------------------------
# Creating a list of amino acids
amino_acids = ['Alanine', 'Cysteine', 'Aspartic Acid', 'Glutamine']
# Iterating over the list and printing each amino acid
for acid in amino_acids:
    print("Amino Acid:", acid)

# Functions and Basic Calculations
# -------------------------------------------------
# Defining a function to calculate the molecular weight of a DNA sequence
def calculate_mw(dna_seq):
    weights = {'A': 331.2, 'G': 347.2, 'C': 307.2, 'T': 322.2}
    return sum(weights[base] for base in dna_seq)

# Calculating and printing the molecular weight of the DNA sequence
mw = calculate_mw(dna_sequence)
print("Molecular Weight of DNA:", mw, "g/mol")

# Working with NumPy Arrays and Simple Statistics
# -------------------------------------------------
# Creating a NumPy array of sample pH values
ph_values = np.array([7.0, 7.4, 6.8, 7.2, 7.1])
# Calculating the mean pH
mean_ph = np.mean(ph_values)
print("Mean pH:", mean_ph)

# Plotting Data with Matplotlib
# -------------------------------------------------
# Generating a simple plot of pH values
plt.plot(ph_values)
plt.title("Sample pH Values Over Time")
plt.xlabel("Time (arbitrary units)")
plt.ylabel("pH")
plt.show()

# Advanced: Working with Dictionaries and Complex Data Structures
# -------------------------------------------------
# Creating a dictionary to map amino acids to their properties
amino_acid_properties = {
    'Alanine': {'pKa': 2.34, 'polarity': 'nonpolar'},
    'Cysteine': {'pKa': 1.96, 'polarity': 'nonpolar'},
    # ... other amino acids
}

# Accessing and printing properties of a specific amino acid
alanine_properties = amino_acid_properties['Alanine']
print("Alanine Properties:", alanine_properties)

# Conclusion: This code block demonstrates basic Python syntax and concepts
# useful in scientific computing, particularly in the field of biochemistry.


# Basic Exercises
- **Hint**: Don't just do these once.. Explore, Experiment, and Explain what you see. When you're ready, move on the the advanced exercises.

## Exercise 1: Variables
- **Task**: Define a `variable` and print its output
## Exercise 2: Lists
- **Task**: Create a `variable` and assign a `list` to it. Write a `for loop` to iterate through the list and print all of the variables.
- **Note**: When creating the `for loop` pay attention to the importance of indentation. Try to execute the code without indenting. What happens? 
## Exercise 3: Functions
- **Task**: Create a function. It should take an input, do some operation, and provide ouput. Be sure to test the function by `calling` it and provide it some input. The simplest option is to `print` an input.
## Exercise 4: Numpy Arrays
- **Task**: Create two `numpy arrays` containing related values. Calculate the mean for each.
## Exercise 5: Plotting with Matplotlib
- **Task**: `Plot` the data from exercise 4 using matplotlib.pylot methods
## Exercise 6: Advanced Data Structures (Dictionary)
- **Task**: Create a `dictionary` data structure that contains information about the 4 nucleic acids found in DNA. Use the internet to identify properties to include.  

## Solutions to Basic Exercises

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt

# Exercise 1: Define a variable
My_Var = 2
print("The value of My_Var is:", My_Var)

# Exercise 2: Loop through a list
My_List = ['item1', 'item2', 'item3', 'item4']
for item in My_List:
    print("Item =", item)

# Exercise 3: Create a function 
def print_it(input):
    print(input)

x = "This statement will be printed" #try other variable assignments, like numbers, etc.
print_it(x)

# Exercise 3: Create a function- something a bit more useful.. 
# Calculate the mean from an input list of datapoints. 
def stats_calc(data):
    mean = sum(data)/len(data) # What does len(data) return? 
    return(mean)
data_list = [1, 5, 8, 10, 21]
print("the average is:", stats_calc(data_list))

# Exercise 4: Create numpy array(s)
enzyme_rate = np.array([150, 175, 140, 121, 145])
pH = np.array([6.8, 6.4, 7.3, 7.8, 7.4])
print(np.median(enzyme_rate), "umoles/s")
print(np.max(pH), "aritrary units")

# Exercise 5: Plot the data from the numpy arrays. 
plt.plot(pH, enzyme_rate, 'o')
plt.title("Enzyme Rate V. pH")
plt.xlabel("pH (arbitrary units)")
plt.ylabel("Enzyme Rate (umoles/s")
plt.show()
    

# Advanced Exercises
Feel free to peak at the solutions, but don't copy/paste or you won't learn anything. Try to understand what each code element does, and get an intuitive feel for the syntax. 

## Exercise 1: DNA Sequence Manipulation
- **Task**: Given a DNA sequence, write a function to compute its complementary strand.
- **Hint**: In DNA, A pairs with T, and G pairs with C.

## Exercise 2: Amino Acid Analysis
- **Task**: Extend the `amino_acids` list with five more amino acids. Then, iterate over this list and print each amino acid along with its length (number of characters).
- **Hint**: Use a `for` loop and the `len()` function.

## Exercise 3: Molecular Weight Calculator Enhancement
- **Task**: Modify the `calculate_mw` function to handle both DNA and RNA sequences. Account for the difference in molecular weight of Uracil in RNA.
- **Hint**: Add a condition to check if the sequence is DNA or RNA and adjust the weights dictionary accordingly.

## Exercise 4: Statistical Analysis of pH Values
- **Task**: Given a new set of pH values, calculate and print the mean, median, and standard deviation.
- **Hint**: Use NumPy functions `np.mean()`, `np.median()`, and `np.std()`.

## Exercise 5: Data Visualization Challenge
- **Task**: Create a scatter plot of the pH values against time. Customize the plot with a title, axis labels, and a different color for the markers.
- **Hint**: Use `plt.scatter()` and explore the `matplotlib.pyplot` documentation for customization options.

## Exercise 6: Advanced Data Structure
- **Task**: Add three more amino acids to the `amino_acid_properties` dictionary with their respective `pKa` and `polarity`. Then, write a function to print the properties of a given amino acid.
- **Hint**: Define a function that takes an amino acid name as input and accesses its propert to biochemistry. Good luck!


In [None]:
# Your Answer(s) Here

## Solutions to Advanced Exercises

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt

# Exercise 1: DNA Sequence Manipulation
def complement_dna(dna_seq):
    complement = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G'}
    return ''.join([complement[base] for base in dna_seq])

# Example usage
dna = "AGCTCGTACGATCG"
print("Complement of", dna, "is", complement_dna(dna))

# Exercise 2: Amino Acid Analysis
amino_acids = ['Alanine', 'Cysteine', 'Aspartic Acid', 'Glutamine', 
               'Valine', 'Leucine', 'Isoleucine', 'Methionine', 'Phenylalanine']
for acid in amino_acids:
    print("Amino Acid:", acid, "- Length:", len(acid))

# Exercise 3: Molecular Weight Calculator Enhancement
def calculate_mw_enhanced(sequence):
    weights_dna = {'A': 331.2, 'G': 347.2, 'C': 307.2, 'T': 322.2}
    weights_rna = {'A': 331.2, 'G': 347.2, 'C': 307.2, 'U': 306.2}
    if 'U' in sequence:
        return sum(weights_rna[base] for base in sequence)
    else:
        return sum(weights_dna[base] for base in sequence)

# Example usage
rna_seq = "AGCUCGUACGAUCG"
print("Molecular Weight of RNA:", calculate_mw_enhanced(rna_seq), "g/mol")

# Exercise 4: Statistical Analysis of pH Values
ph_values_new = np.array([6.9, 7.3, 7.2, 7.4, 7.0, 7.1])
print("Mean pH:", np.mean(ph_values_new))
print("Median pH:", np.median(ph_values_new))
print("Standard Deviation of pH:", np.std(ph_values_new))

# Exercise 5: Data Visualization Challenge
plt.scatter(range(len(ph_values_new)), ph_values_new, color='red')
plt.title("pH Values Over Time")
plt.xlabel("Time (arbitrary units)")
plt.ylabel("pH")
plt.show()

# Exercise 6: Advanced Data Structure
amino_acid_properties = {
    'Alanine': {'pKa': 2.34, 'polarity': 'nonpolar'},
    'Cysteine': {'pKa': 1.96, 'polarity': 'nonpolar'},
    'Valine': {'pKa': 2.32, 'polarity': 'nonpolar'},
    'Leucine': {'pKa': 2.36, 'polarity': 'nonpolar'},
    'Isoleucine': {'pKa': 2.36, 'polarity': 'nonpolar'}
}

def print_amino_acid_properties(name):
    properties = amino_acid_properties.get(name, "Not found")
    print(f"Properties of {name}: {properties}")

# Example usage
print_amino_acid_properties("Alanine")
