# Programming Boot-Up

### 1. What native data structures can you name in Python? Of these, which are mutable, and which are immutable?  

* From Python3 documentation: https://docs.python.org/3/library/stdtypes.html   
The principal built-in types are **numerics**, **sequences**, **mappings**, **classes**, **instances** and **exceptions**.  
    * Numeric Types - **int, float, complex**
        * **boolean** is a subtype of numeric type
    * Iterator Types - Generator Types
    * Sequence Types - **list, tuple, range**
    * Text Sequence Type - **str**
    * Binary Sequence Types — **bytes, bytearray, memoryview**
    * Set Types — **set, frozenset**
    * Mapping Types - **dict**
    * Context Manager Types
    * Other Built-in Types - **modules, classes and class instances, functions, methods, code objects, type objects, the null object, the ellipsis object, the notimplemented object, boolean values, internal objects**

* From Wikipedia: https://en.wikibooks.org/wiki/Python_Programming/Data_Types

| Some immutable types | Some mutable types 
| -------------------- | ------------------ 
| int, float, complex | array |
| str | bytearray 
| bytes | list 
| tuple | set 
| frozenset | dict 
| bool |   


### 2. Explain the difference between a list and a dictionary?

From Quora: https://www.quora.com/What-is-the-difference-between-a-list-and-a-dictionary-in-Python
* **List** is the most versatile mutable Sequence available in Python, which can be written as a list of comma-separated values between square brackets.
    * Elements present in List maintain their order unless explicitly re-ordered .
    * Elements are accessed through their numeric (zero based) index values.
    * If you have a collection of data that does not need random access, use List.
    * Where you have to deal with values which can be changed, use List.
* **Dictionary** is an unordered mutable collection of key-value pairs. Dictionaries are used to handle large amount of data.
    * Every entry is a key-value pair.
    * Elements are accessed using key values.
    * When you are dealing with unique keys and you are mapping values to the keys, use Dictionary.
                   
### 3. In a list, what data types can be elements?

* The elements present in list can be of any type (float, string, tuple, list, etc.), and types can be mixed.
       
### 4. In a dictionary, what data types can the key be? And the values? Why?
    
* Key values can be of any hashable type (i.e. not a dict), which includes: strings, numbers, and tuples. Types can be mixed.
    * For example, lists can't be keys because they are not hashable and looking up different lists with the same contents would produce different results, even though comparing lists with the same contents would indicate them as equivalent.
* Values can be of any type (including other dict’s), and types can be mixed.
    
    
### 5. When would you use a list vs. a tuple vs. a set in Python?

* Use **sets** if you have hashable items, don't care either way about order or duplicates, and want speedy membership checking.
* Use **lists** when you have non-hashable items, care about the order or duplicates, and want to change the items (mutate).
* Use **tuples** if you're defining a constant set of values (don't need to change the values later) and all you're ever going to do with it is iterate through it (tuples are faster than lists).
    
    
### 6. Explain the difference between a for loop and a while loop.

* **While** loop is used in situations where we do not know how many times loop needs to be excuted beforehand.
    * In while loop, condition is checked first, if the given condition is true, then the control will enter the body of the loop.
* **For** loop is used where we already know about the number of times loop needs to be excuted. Typically for a index used in iteration.
    * In the For loop you MUST create a new variable, thats not true for the While loop.


### 7. What packages in the standard library, useful for Data Science work, do you know?

* In standard library. From Python3 documentation: https://docs.python.org/3/library/index.html

    1. **re — Regular expression operations**. This module provides regular expression matching operations.
        * re.split(pattern, string, maxsplit=0, flags=0), re.sub(pattern, repl, string, count=0, flags=0), re.findall(pattern, string, flags=0), etc.
    2. **math — Mathematical functions**. This module is always available. It provides access to the mathematical functions defined by the C standard.
        * math.ceil(x), math.floor(x), math.factorial(x), math.isnan(x), math.remainder(x, y), math.exp(x), math.expm1(x) - Return e raised to the power x, minus 1, math.log(x[, base]), math.log1p(x) - Return the natural logarithm of 1+x (base e), math.log2(x) - Return the base-2 logarithm of x, math.log10(x) - Return the base-10 logarithm of x, math.pow(x, y), math.sqrt(x), math.pi, math.e, math.nan, math.inf, etc.
    3. **random — Generate pseudo-random**. This module implements pseudo-random number generators for various distributions.
        * random.seed(a=None, version=2), random.randint(a, b), random.choice(seq), random.shuffle(x[, random]), random.sample(population, k), random.random() - Return the next random floating point number in the range [0.0, 1.0)., random.uniform(a, b), random.normalvariate(mu, sigma), etc.
    4. **datetime — Basic date and time types**. The datetime module supplies classes for manipulating dates and times in both simple and complex ways.
        * **Available Types**. class datetime.date, class datetime.time, class datetime.datetime, class datetime.timedelta, class datetime.tzinfo, class datetime.timezone
        * **timedelta Objects**. A timedelta object represents a duration, the difference between two dates or times.
            * class datetime.timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)
            * timedelta.min, timedelta.max, timedelta.total_seconds(), mathematical operations with timedeltas, etc..
        * **date Objects**. A date object represents a date (year, month and day) in an idealized calendar, the current Gregorian calendar indefinitely extended in both directions.
            * class datetime.date(year, month, day)
            * date.min, date.max, date.year, date.month, date.day, date.replace(year=self.year, month=self.month, day=self.day), date.weekday(), date.strftime(format), etc.
        * **datetime Objects**. A datetime object is a single object containing all the information from a date object and a time object.
            * class datetime.datetime(year, month, day, hour=0, minute=0, second=0, microsecond=0, tzinfo=None, *, fold=0)
            * classmethod datetime.today(), classmethod datetime.now(tz=None), classmethod datetime.strptime(date_string, format), datetime.year, datetime.month, datetime.day, datetime.hour, datetime.minute, datetime.second, datetime.microsecond, datetime.weekday(), etc.
    5. **os.path — Common pathname manipulations**. This module implements some useful functions on pathnames.
        * os.path.exists(path), os.path.isfile(path), os.path.isdir(path), os.path.islink(path), os.path.join(path, *paths), os.path.samefile(path1, path2), os.path.split(path), etc.
    6. **glob — Unix style pathname pattern expansion**. The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order.
        * glob.glob(pathname, *, recursive=False), glob.iglob(pathname, *, recursive=False), glob.escape(pathname) - Escape all special characters ('?', '*' and '[')
    7. **statistics — Mathematical statistics functions**. This module provides functions for calculating mathematical statistics of numeric (Real-valued) data.
        * These functions calculate an average ortypical value from a population or sample.

| | 
|------- | -----------------------------------
| mean() | Arithmetic mean (“average”) of data.
| harmonic_mean() | Harmonic mean of data.
| median() | Median (middle value) of data.
| median_low() | Low median of data.
| median_high() | High median of data.
| median_grouped() | Median, or 50th percentile, of grouped data.
| mode() | Mode (most common value) of discrete data.

| These functions calculate a measure of how much the population or|sample tends to deviate from the typical or average values.
|-|------------------------------------------------
| pstdev() | Population standard deviation of data.
| pvariance() | Population variance of data.
| stdev() | Sample standard deviation of data.
| variance()| Sample variance of data.


  

* Not in standard library. From https://www.upwork.com/hiring/data/15-python-libraries-data-science/
    1. **NumPy** is the foundational library for scientific computing in Python. NumPy introduces objects for multidimensional arrays and matrices, as well as routines that allow developers to perform advanced mathematical and statistical functions on those arrays with as little code as possible.
    2. **SciPy** builds on NumPy by adding a collection of algorithms and high-level commands for manipulating and visualizing data. This package includes functions for computing integrals numerically, solving differential equations, optimization, and more.
    3. **Pandas** adds data structures and tools that are designed for practical data analysis in finance, statistics, social sciences, and engineering. Pandas works well with incomplete, messy, and unlabeled data (i.e., the kind of data you’re likely to encounter in the real world), and provides tools for shaping, merging, reshaping, and slicing datasets.
    4. **matplotlib** is the standard Python library for creating 2D plots and graphs. It’s pretty low-level, meaning it requires more commands to generate nice-looking graphs and figures than with some more advanced libraries. However, the flip side of that is flexibility. With enough commands, you can make just about any kind of graph you want with matplotlib.
    5. **scikit-learn** builds on NumPy and SciPy by adding a set of algorithms for common machine learning and data mining tasks, including clustering, regression, and classification. As a library, scikit-learn has a lot going for it. Its tools are well-documented and its contributors include many machine learning experts. What’s more, it’s a very curated library, meaning developers won’t have to choose between different versions of the same algorithm. Its power and ease of use make it popular with a lot of data-heavy startups, including Evernote, OKCupid, Spotify, and Birchbox.
    6. **TensorFlow** is another high-profile entrant into machine learning, developed by Google as an open-source successor to DistBelief, their previous framework for training neural networks. TensorFlow uses a system of multi-layered nodes that allow you to quickly set up, train, and deploy artificial neural networks with large datasets. It’s what allows Google to identify objects in photos or understand spoken words in its voice-recognition app.
    7. **Seaborn** is a popular visualization library that builds on matplotlib’s foundation. The first thing you’ll notice about Seaborn is that its default styles are much more sophisticated than matplotlib’s. Beyond that, Seaborn is a higher-level library, meaning it’s easier to generate certain kinds of plots, including heat maps, time series, and violin plots.


### 8. Do you know any additional data structures available in the standard library?

From Python3 documentation: https://docs.python.org/3/library/index.html

* Available types from datetime modul
    * class **datetime.date**
    * class **datetime.time**
    * class **datetime.datetime**
    * class **datetime.timedelta**
    * class **datetime.tzinfo**
    * class **datetime.timezone**
* **collections — Container datatypes**. This module implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers, dict, list, set, and tuple.

|   | 
| - |
| namedtuple() | factory function for creating tuple subclasses with named fields
| deque | list-like container with fast appends and pops on either end
| ChainMap | dict-like class for creating a single view of multiple mappings
| Counter | dict subclass for counting hashable objects
| OrderedDict | dict subclass that remembers the order entries were added
| defaultdict | dict subclass that calls a factory function to supply missing values
| UserDict | wrapper around dictionary objects for easier dict subclassing
| UserList | wrapper around list objects for easier list subclassing
| UserString | wrapper around string objects for easier string subclassing
    
* **array — Efficient arrays of numeric values**. This module defines an object type which can compactly represent an array of basic values: characters, integers, floating point numbers. Arrays are sequence types and behave very much like lists, except that the type of objects stored in them is constrained. The type is specified at object creation time by using a type code, which is a single character. The type codes can be found in the documentation.
    * class array.array(typecode[, initializer])
        * array.append(x), array.count(x), array.extend(iterable), array.index(x), array.insert(i, x), array.pop([i]), array.remove(x), array.tobytes(), array.tofile(f), array.tolist(), array.tostring(), array.tounicode(), etc.

### 9. Can you explain what a list or dict comprehension is?

Comprehensions are simply one line for loops that result in the building of a list, dict, set. 