<a href="https://colab.research.google.com/github/ShiweiHe0713/Data-Science-for-Business-Techincal/blob/main/Python_basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

To open notebook in Colab please click below:

<a href="https://colab.research.google.com/drive/14DVlOeSMyXZyqq1LQ__VueBpxe-YQkrS?authuser=1#scrollTo=FzWGAK6sK43D" target="_parent"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" /> </a>

# Python basics


**Spring 2024 - Instructor:  Chris Volinsky**

**Teaching Assistants: Aditya Deshpande, Stuti Mishra, Krutika Savani**

**Original Notebooks courtesy of Prof. Foster Provost and Rubing Li**

***

This notebook shows examples of Python code, including built-in functions, packages and programming structures useful for Data Science and Business Analytics. This should be review for you all!  At the bottom there are some pointers to a few resources.

## Python code

### 1. Variables, operations and data types
Variables are used to store data.
The data can be of a variety of types. Here're three types of data for example:  

- Integer numbers
- Floating (decimal numbers)
- Strings

The Python character for **variable assignment** is the equal sign (=). Let's create three variables, one of each type, with 3 different names:

In [None]:
some_integer = 5
some_float = 7.12
some_string = "Student"

We can **print** out these variables. Remember we need to run the previous cell first!

In [None]:
print (some_integer)
print (some_float)
print (some_string)

5
7.12
Student


What if I want to print some text and then some numbers? One easy way to do this is to realize that printing will always **want** string data.

If you have data that is not a string (like an integer or float), you can **convert** it to a string:

In [None]:
print ("My integer is " + str(some_integer) + ".")
print ("My float converted into integer is " + str( int(some_float) ) + ".")

My integer is 5.
My float converted into integer is 7.


What else can we do with our variables? We can do basic math: **operations**.

In [None]:
print ("sum " + str( some_integer + some_float ))
print ("multiplication " + str ( some_integer * some_float ))
print ("quotient " + str( some_integer / some_float ))
print ("power " + str( 10**some_integer ))

sum 12.120000000000001
multiplication 35.6
quotient 0.7022471910112359
power 100000


We can store this as a new variable and print it:

In [None]:
my_sum = some_integer + some_float
print ("Sum variable: " + str( my_sum ))

Sum variable: 12.120000000000001


There are also other **data structures**:

- Lists (sometimes refered to as "arrays", but look up the difference)
- Dictionaries
- Sets


In [None]:
some_list = [0,0,1,2,3,3,4.5,7.6]
some_dictionary = {'student1': '(929)-000-0000', 'student2': '(917)-000-0000', 'student3': '(470)-000-0000'}
some_set = set( [1,2,4,4,5,5] )

print ("This is a list:  " + str(some_list))
print ("This is a dictionary:  " + str( some_dictionary ))
print ("This is a set:  " + str( some_set ))

This is a list:  [0, 0, 1, 2, 3, 3, 4.5, 7.6]
This is a dictionary:  {'student1': '(929)-000-0000', 'student2': '(917)-000-0000', 'student3': '(470)-000-0000'}
This is a set:  {1, 2, 4, 5}


How can we use  **individual** elements?

In Python (and generally by computer science convention), we count elements of a _list_ starting from zero! To get the first item we should look in the 0th space:


In [None]:
print (some_list[0])

0


Adding things to the end of the list is "appending" them:

In [None]:
some_list.append(500)
print (some_list)

[0, 0, 1, 2, 3, 3, 4.5, 7.6, 500]


How can we retrieve an element (**VALUE**) of a _dictionary_ ?  Use its **"KEY"** !!

In [None]:
print (some_dictionary['student1'])

(929)-000-0000


For more customizable string creation and printing, python also includes rich formatting capability that may be familiar to developers of other high-level programming language. Some examples are below. In a nutshell, one adds placeholders to the string being constructed. These placeholders always start with a `%` followed by some additional information that determines how the data will be formatted. After the string, another `%` is used, followed by the acutal data that is to be inserted in the prior placeholders. The full capabilities of python string formatting are beyond the scope of this primer, for more info, please consult the [offical documentation](https://docs.python.org/2/library/stdtypes.html#string-formatting).

In [None]:
print ("generic %s formatting, equivalent to the str() functionality seen earlier" % "STRING")
print ("an example with zero padding an integer %003d and multiple %s" % (5, "PLACEHOLDERS"))
print ("often one wants to truncate the precision when displaying floats %.3f" % 3.1415926536)

generic STRING formatting, equivalent to the str() functionality seen earlier
an example with zero padding an integer 005 and multiple PLACEHOLDERS
often one wants to truncate the precision when displaying floats 3.142


### 2. Create functions

Functions are essentially blocks of code that you can reference by name. Functions allow us to execute predefined operations and to define our own operations that will be available later.  They encapsulate procedures. You should define functions for steps that you anticipate using multiple times. If you only perform a series of steps once or twice, you probably do not need to define a function.

(*If you haven't thought this through before, consider the drawback of repeated code: what if later you realize that you need to fix something in that code block.  You'd have to go through and fix it everywhere.*)

Functions are created using the **def** statement, following the same indented code structure as a conditional or a loop.

```
def function_name(inputs):
    statements
    return object(s) # optional
```

For example, consider having to calculate the area of a circle.

In [None]:
## define a function to calculate the area of a circle
def area_of_a_circle(radius):
    area = 3.1416 * radius ** 2
    return area

In [None]:
## call a function
circle_area = area_of_a_circle(5)
print ("Area of a circle with radius 5 is: " + str( circle_area))

Area of a circle with radius 5 is: 78.53999999999999


Can you see what is going on here? My function that I helpfully named `"area_of_a_circle"` takes one **argument** that we will call radius. It then uses this radius to get the area and then *returns* it. Now, whenever I want to get the area of some circle, I simply call `area_of_a_circle()` and place the radius in the middle of the parentheses.

### 3. Loops / iterations

For data analysis we do a lot of repetitive things. This doesn't mean we need to do a ton of copy and pasting, though. We can use **loops** to make this easy. Loops are used to perform a series of steps repeatedly. As a very simple example, what if we wanted to square each number from 1 to 5?

In [None]:
for number in [1, 2, 3, 4, 5]:
    print (number * number)

1
4
9
16
25


Let's use the function we did before. Remember this is a function that can only be used in **this notebook**

( unless we write a **"script"** file, but we'll see that later... ):

In [None]:
for number in [1, 2, 3, 4, 5]:
    print ("Area of circle with radius " + str(number) + " is: " + str( area_of_a_circle(number) ))

Area of circle with radius 1 is: 3.1416
Area of circle with radius 2 is: 12.5664
Area of circle with radius 3 is: 28.2744
Area of circle with radius 4 is: 50.2656
Area of circle with radius 5 is: 78.53999999999999


### 4. Conditionals and comparisons

Sometimes we need to check something before deciding what to do next.

**if** statements are the most common type of conditional. They allow you to execute a different set of statements under different conditions.

For example,

In [None]:
def is_best_prof(name):
    if name == "Chris":
        return True
    else:
        return False

In [None]:
print (is_best_prof("Chris"))

True


In [None]:
print (is_best_prof("Marvin"))

False


In [None]:
my_prof = "Chris"
if is_best_prof(my_prof):
    print("You're going to have a great semester!")
else:
    print("Well, make the best of it!")

You're going to have a great semester!


You see in that last one how we have a conditional in the cell, and then call a function that has a conditional inside it?  

Let's put a whole bunch of these things together:

In [None]:
my_profs = ["John", "Paul", "George", "Ringo"]
one_best = False
for prof in my_profs:
    if is_best_prof(prof):
        one_best = True
if one_best:
    print("You're going to have a great semester!")
else:
    print("Well, make the best of it!")

Well, make the best of it!


As we can see here, we made **comparison** of names with the "equal" operation  (==).  Remember ... it's == not just = !

Other comparisons:

- strictly less than  <
- less than or equal  <=
- strictly greater than  >
- greater than or equal  >=
- not equal  !=
- object identity  "is"
- negated object identity "is not"

What if we want to compare more than one element?
We should include logical operations such as:

- "and", also known as "&"
- "or", also known as "|"

Let's see if you can guess my age with this function!!


In [None]:
def is_my_age(age_argument):
    if age_argument < 20:
        return "Of course not!"
    elif (age_argument >= 20) & (age_argument <= 40):
        return "Maybe.."
    elif age_argument > 40:
        return "Don't even think about it!"

In [None]:
print (is_my_age(10))

Of course not!


In [None]:
print (is_my_age(23))

Maybe..


In [None]:
print (is_my_age(80))

Don't even think about it!


## Help, help, and more help!

The two alternative textbooks (ISLP and Shmueli books) mentioned in the Syllabus are great, and free, resources with lots of data science examples and Python code.  They will be invaluable for you!

Please dont get discouraged if you are stuck. This is how professional programmers work every day.  Try and do it yourself first, and then reach out for other resources.  Google your problem and it will likely take you to posts at [StackOverflow](http://stackoverflow.com), a popular programming question and answer site, with an extremely rich repository of solutions to problems that people have encountered.  It is likely that someone has asked the question you need answering and has posted a solution.

Of course, these days it is also easy to ask a question of GenAI tools like ChatGPT.  These will often get you the correct answer, but I caution against simple cut-and-paste of the solution.  Try and work your way through the code and understand the answer, and even ask ChatGPT questions like "why is that line in the code?".  Typing the code in instead of cut/paste can help you with the learning process.

Programming can be frustrating. It's natural to get stuck or frustrated from time to time, even with the incredible resources that are now available. Take breaks, ask others for help, and be kind to yourself :)

---

Here are some other resources suggested by a previous prefessor of this class:
- [Codecademy's Python Course](https://www.codecademy.com/learn/learn-python-3). Working though this class will give you a _great_ foundation for Python.
- [Python for Data Analysis](https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython-ebook/dp/B009NLMB8Q/ref=mt_kindle?_encoding=UTF8&me=) was the book that Prof. Foster suggested to prior iterations of this course. You can take a look at the chapters: Preliminaries, Introductory Examples (e.g. "Counting Time Zones with pandas”), IPython (page 46 to 62) and specially, Pandas--one of the main Python packages for data analysis.  We will work with Pandas in class.
- [Pandas Cookbook](https://www.amazon.com/Pandas-Cookbook-Scientific-Computing-Visualization/dp/1784393878) is another great resource to learn Pandas. It has lots of practice problems with detailed solutions in IPython notebooks.
