# Operating Systems

>An operating system, often abbreviated as OS, is a  very large, complex piece of software that controls how a computer runs. It manages hardware resources, schedules tasks, and provides an interface for users and other software to interact with the hardware of a computer. It handles fundamental tasks such as managing files on a storage device, allocating system memory, and facilitating the interaction between software applications and hardware components like printers, disk drives, displays, etc. For programmers, it also provides services like APIs (Application Programming Interfaces) which allow you to write programs that can perform actions like reading from a file or sending data over a network. 

## Motivation

In this lesson we will learn about the different popular 'flavours' of operating system, and their advantages and disadvantages with regard to the work you will do in this course. This will help you make an informed choice about which OS to use.

## Operating System Components

Operating systems are very large, complex pieces of software, and it is beyond the scope of this lesson to give a full description of the many things they are responsible for. However some aspects of an OS are especially important for data professionals. We will introduce them briefly below, but you will encounter them again as you go through the course:

**File System:** An OS's file system is important for data scientists as it determines how files are stored, organized, and retrieved. This can impact the performance of data processing tasks, especially when dealing with large data sets.

**Memory Management:** The OS's approach to managing memory can impact the performance of data-intensive tasks. Good memory management ensures that applications have the resources they need to process large datasets efficiently.

**Command Line Interface:** The command line interface is a system which allows you to control your operating system by typing in specific text instructions. Different operating systems implement this feature differently, and this can influence your choice of which OS to use.

**Process Management and Scheduling:** This allows multiple tasks (or processes) to share system resources effectively. For example, a data scientist might run multiple data analysis tasks in parallel, and the OS's process management and scheduling mechanisms can help ensure these tasks run smoothly.

**Security Features:** Data security is vital in any data science project. Security features like the file permissions system, firewalls and the exploitability of the OS software itself can affect how well protected your data is from unauthorized access.

**Interoperability:** Data professionals use a variety of software tools and languages for their work. The ability of an OS to interoperate smoothly with these different tools can be a crucial factor in deciding which one to use.




## Unix-type Operating Systems

> UNIX is a powerful, multi-user and multitasking operating system originally developed in the 1970s at Bell Labs by Ken Thompson, Dennis Ritchie, and others. It's known for its portability, flexibility, and robustness, and it has significantly influenced many other operating systems, including Linux and Mac OS, which are often referred to as Unix-like or Unix-based because they share similar concepts and structures.

In UNIX, everything is treated as a file (including hardware like disks, keyboards, etc.) which makes its architecture elegant and easy to interact with. It also introduced many concepts that are still commonly used today, such as a hierarchical file system, shells (like Bash), and scripting languages.



## Operating System Choices

In general, the work that you undertake in this course can be done on any operating system. If we had to make a recommendation it would be for Linux, as the majority of the software we use is open-source, and will run very well on Linux systems. That said, you can definitely use Mac OS (which is very similar to Linux) without any major modifications, and Windows with a few necessary changes.

### Linux

Linux is a Unix-like operating system which is highly regarded by data professionals due to its flexibility, control, and extensive customization options. It is open-source software, meaning the code that makes it run can be viewed and edited by anyone. This allows users can tailor the environment to their exact requirements, which can be a significant advantage when setting up complex data processing workflows. Its powerful command-line interface allows for efficient processing and automation of tasks, and Linux is widely used in server environments and cloud platforms (where you use the internet to access another computer to do your processing and data storage). Linux therefore offers users a consistent experience between their local and remote work environments. 

On the downside, Linux can have a steeper learning curve, especially for those accustomed to Windows or Mac OS interfaces. While its software library is vast, some commercial software, particularly those designed for Windows or Mac OS, may not have a Linux version. 

#### Linux Distributions

Unlike the other operating system flavours listed here, Linux is not a single OS, but comes in several varieties, known as distributions. As Linux is open-source, anyone can create a version of it, so a Linux distribution is a specific version of the software, packaged together by an individual or organization. Each distribution can have different default settings, software packages, and user interfaces, tailored to different types of users or use cases. Some well-known distributions include Ubuntu, Fedora, and Debian, each with their own unique characteristics, strengths, and community support. 

For this course, we recommend using Ubuntu if you are trying Linux for the first time. It is stable and easy to install and use, and although less customisable than some other distros, it will be more than adequate for the tasks you'll need to do in this course. 

#### Pros 

- Open source kernel, code is available (see [here](https://github.com/torvalds/linux))
- Good open source software support (most of the open source tools are created with this OS in mind)
- __Free to use:__  most of the distributions are created by contributors or non-profit organizations
- Easy to use out of the box
- Quite modifiable

#### Cons

- No direct support from a large company
- Some third-party commercial software is not compatible with Linux
- __Less intuitive__: Although many Linux distributions have really similar UIs to Windows & Mac OS, some thing work a little differently 


### Mac OS

Mac OS is another Unix-like operating system, which comes with many of the same advantages as Linux. Open-source software usually supports Mac OS as well as Linux due to their similar architecture. Mac OS also offers a highly stable, secure, and user-friendly environment, supported by a company (Apple) who can offer technical support if something goes wrong. However, it's important to note that Mac OS runs exclusively on Apple hardware, which can be expensive and less customizable than alternative. Some software might also have compatibility issues with Apple computers, as they make their own processors which work differently to the more widely-used processors made by Intel or AMD.

#### Pros

- Good open source software support (initially based on open source project FreeBSD)
- Supported by large company
- Easy to use out of the box
- Quite modifiable
- Unix-based OS 

#### Cons

- __Closed source__ : Current source code of Mac OS is not available to read or customise
- __Tied to Apple Hardware__ : It is generally only possible to use Mac OS with Apple machines


### Windows

> Windows is the most widely-used desktop operating system, and is especially popular for home and business use. Historically it was considered to be a less-used OS for developers and programmers but this is no longer the case. However it remains less popular for data professionals, mainly because the majority of the tools used for data analysis are open source, and support for them tends to focus on Unix-type operating systems. 

Windows has the advantage of seamless integration with the Microsoft Azure cloud platform, which is widely used in business, and if you are working for a company which uses Microsoft products extensively, it may be the case that your data work happens on Windows too. On the other hand, it can fall short when compared to Unix-based systems (like Linux and Mac OS) in tasks such as scripting and process control, which are often important in a data science or data engineering context. Using windows for this course is certainly possible, but you may be required to download and install a few extra tools as you go through the course in order to make it compatible.

#### Pros

- Widely used for day-to-day non-professional usage
- Supported by large company
- Easy to use out of the box
- Seamless integration with Microsoft Azure cloud platform

#### Cons

- __Closed source__: Source code of Windows is not available as it is __compiled__
- __Not free__:  One has to buy this OS
- Generally less well-supported by open source software (which we will use)
- Hard to modify and tune for our own needs
- Has a different terminal (Powershell) by default. We will learn more about the terminal in the next lesson!

#### WSL

Windows will not work with some specific software used on this course, particularly for the Data Engineering specialisation, and so it will be necessary to install Windows Subsystem for Linux (WSL). This is a feature that lets developers install a Linux distribution and use Linux applications and utilities directly on Windows, unmodified. This is a more effective approach than using a virtual machine (a piece of software that acts like a separate computer running inside your existing computer)  or a dual-boot setup (where you have two completely separate operating systems on your hard disk). Virtual machines can be slow and take up a lot of memory, whereas with a dual-boot setup you have to decide which OS you will use when you start up the computer, and can't see the files associated with the other OS. 

To install WSL:

- Open up a PowerShell terminal from the start menu:

- type the command `wsl --install`

<p align="centre">
    <img src="images/WSL_install.gif"  width="700"/>
</p>
<br>

- Wait for the installation to complete, and then restart your machine

- You can now access WSL by typing `WSL` into the start menu 

<p align="centre">
    <img src="images/open_WSL.gif"  width="700"/>
</p>
<br>

During the course, you will be instructed to use WSL when it is necessary for a particular task which uses software that does not run well on Windows.

## Key Takeaways

- An operating system (OS) is a complex piece software that manages hardware resources, schedules tasks, provides a user interface, handles core functions like file and memory management.
- There are various different flavours of operating system, including Linux, Mac OS and Windows
- The Unix operating system was developed in the 1970s and its design has influenced many modern Operating Systems including Mac OS and Linux
- Linux is an open source OS based on Unix, which works well with the open source software used for many data applications
- Mac OS is a proprietary Unix-type OS which only works on Apple hardware, but is compatible with most open source software
- Windows is the world's most widely-used OS, and is very powerful, but also has some compatibility issues with open-source software and software designed for cloud computing.
- You can install a Linux distribution on Windows using Windows Subsystem for Linux (WSL)