### What is a Function

- A function is a block of code that only runs when it's called.
- You can pass data (called parameters) into a function.
- The function can return data as a result.

#### Importance
Enable us to resuse the code and make it more modular, important for complex data analysis and plotting routines.



#### Types of Functions

| Type of Function             | Example Function              | Section            |
|------------------------------|-------------------------------|--------------------|
| Built-In functions           | `max()`                       | 1. Getting Started |
| User-defined functions       | `def my_function(): pass`     | 16. Functions      |
| Lambda functions             | `lambda x: x + 1`             | 17. Lambda         |
| Standard Library functions   | `math.sqrt()`                 | 18. Modules        |
| Third-Party Library Functions| `numpy.array()`               | 19. Library        |

Note: We won't be covering Generator, Asynchronous, or Recursive Functions as they are out of scope of Data Analytics.

#### Built-in Functions

Standard within python. We've already used a few:

* `print()`: Displays output
* `type()`: Checks the data type of objects
* `range()`: Generates a sequence of numbers, useful in loops
* `len()`: Counts the number of elements in a data structure

[Here are all the built-in functions in Python](https://docs.python.org/3/library/functions.html).

In [1]:
skill_list = ['Python', 'SQL', 'Excel']
print(skill_list)

['Python', 'SQL', 'Excel']


In [2]:
type(skill_list)

list

In [3]:
len(skill_list)

3

In [4]:
range(0,5)

range(0, 5)

In [5]:
data_salaries = [95000, 100000, 85000, 97000, 140000]

In [6]:
min(data_salaries)

85000

In [7]:
max(data_salaries)

140000

In [8]:
sum(data_salaries)

517000

In [9]:
sorted(data_salaries)

[85000, 95000, 97000, 100000, 140000]

## User-Defined Functions

These are created by the user with your name and syntax of choice: `calculate_something_special()`.

#### WARNING 
Do not name your function the same as standard Python objects.

For example, this is a bad idea:

```python
def print(input):
    return "Hello" + input
```

In this case the built-in `print()` function would be overridden.

In [10]:
base_salary = 100000
bonus_rate = 0.1

total_salary = base_salary * (1 + bonus_rate)

total_salary

110000.00000000001

In [12]:
def calculate_salary():
    base_salary = 100000
    bonus_rate = 0.1

    total_salary = base_salary * (1 + bonus_rate)
    
    return total_salary

In [13]:
calculate_salary()

110000.00000000001

In [18]:
# define the arguments inside the function
def calculate_salary(base_salary, bonus_rate):

    total_salary = base_salary * (1 + bonus_rate)
    
    return total_salary


In [19]:
calculate_salary(110000, 0.2)

132000.0

In [20]:
#we can add an optional argument (eg: if we have a standard bonus rate)
def calculate_salary(base_salary, bonus_rate=.1):

    total_salary = base_salary * (1 + bonus_rate)
    
    return total_salary


In [21]:
calculate_salary(100000)

110000.00000000001

In [None]:
#can be overwritten
calculate_salary(100000, 0.2)

120000.0

### Practice

Create a function job_title_contains that takes a job title and a keyword as arguments, and returns True if the job title contains the keyword, otherwise returns False. 

To confirm the function works, set the job_title to 'Data Scientist' and the keyword to 'Data'.

In [27]:
job_title = 'Data Scientist'
keyword = 'Data'

def job_title_contains(job_title, keyword):
    if keyword in job_title:
        return True
    else:
        return False

In [24]:
job_title_contains(job_title, keyword)

True

In [25]:
job_title_contains('Data Analyst', 'Analyst')

True

In [28]:
job_title_contains('Data Analyts', 'Engineer')

False

In [29]:
# simpler way
def job_title_contains(job_title, keyword):
    return keyword in job_title


In [30]:
job_title_contains(job_title, keyword)

True

In [31]:
job_title_contains('Data Analyst', 'Engineer')

False

Create a function average_salary that takes a list of salaries and returns the average salary. With the salaries set as [95000, 120000, 105000, 90000, 130000].

In [32]:
salaries = [95000, 120000, 105000, 90000, 130000]

def average_salary(salaries):
    return sum(salaries)/len(salaries)


In [33]:
average_salary(salaries)

108000.0

Create a function salary_statistics that takes a list of salaries and returns a dictionary with the minimum, maximum, and average salary. The list of salaries is set to [95000, 120000, 105000, 90000, 130000].

In [34]:
def salary_statistics(salaries):
    #Return a dictionary of statistics
    statistics = {'minimum':min(salaries), 'maximum':max(salaries), 'average':sum(salaries)/len(salaries) }
    return statistics

In [35]:
salaries = [95000, 120000, 105000, 90000, 130000]
salary_statistics(salaries)

{'minimum': 90000, 'maximum': 130000, 'average': 108000.0}

In [36]:
# Different way

def salary_statistics(salaries):
    return {
        'min': min(salaries),
        'max': max(salaries),
        'average': sum(salaries) / len(salaries)
    }

Create a function job_posting_summary that takes a list of job postings, where each posting is a dictionary with keys 'title', 'location', and 'salary', and returns a summary dictionary with the total number of postings, the average salary, and a list of unique locations. The job_postings is set to [{'title': 'Data Scientist', 'location': 'New York', 'salary': 95000}, {'title': 'Data Analyst', 'location': 'San Francisco', 'salary': 85000}, {'title': 'Machine Learning Engineer', 'location': 'New York', 'salary': 115000}].

In [41]:
job_postings = [
    {'title': 'Data Scientist', 'location': 'New York', 'salary': 95000},
    {'title': 'Data Analyst', 'location': 'San Francisco', 'salary': 85000},
    {'title': 'Machine Learning Engineer', 'location': 'New York', 'salary': 115000}
]

# what we need to get: {total_postings: x, average_salary: y, unique_locations: z}

def job_posting_summary(job_postings):
    total_postings = len(job_postings)
    average_salary = sum(posting['salary'] for posting in job_postings) / len(job_postings)
    unique_locations = set(posting['location'] for posting in job_postings)
    return {
        'total_postings': total_postings,
        'average_salary': average_salary,
        'unique_locations': unique_locations
    }

job_posting_summary(job_postings)



{'total_postings': 3,
 'average_salary': 98333.33333333333,
 'unique_locations': {'New York', 'San Francisco'}}

In [42]:
# other way, very similar
def job_posting_summary(job_postings):
    total_postings = len(job_postings)
    total_salary = sum(posting['salary'] for posting in job_postings)
    average_salary = total_salary / total_postings
    unique_locations = list(set(posting['location'] for posting in job_postings))
    return {
        'total_postings': total_postings,
        'average_salary': average_salary,
        'unique_locations': unique_locations
    }

job_postings = [
    {'title': 'Data Scientist', 'location': 'New York', 'salary': 95000},
    {'title': 'Data Analyst', 'location': 'San Francisco', 'salary': 85000},
    {'title': 'Machine Learning Engineer', 'location': 'New York', 'salary': 115000}
]
job_posting_summary(job_postings)

{'total_postings': 3,
 'average_salary': 98333.33333333333,
 'unique_locations': ['San Francisco', 'New York']}