# Why use Pandas and Python when you already know Excel in and out?

This is a very valid question and one that may have crossed your minds multiple times.

## 1. It enables you to work with all sorts of data, not just excel spreadsheets.

While MS Excel is one of the most widely used data entering tools, in today's growing world, many more forms of data storing are becoming more and more prominent.

Python and Pandas can actually handle more than just Excel sheets. As you have seen earlier, CSV files are easily imported. However, that is just one of many options available. In fact, Pandas supports SQL Databases, XML, JSON, SPSS, SAS and STATA data files as well, while regular Python is easily able to interface with No-SQL databases (like MongoDB) with ease and then export the output to Pandas. This means, more power to you!

![image.png](attachment:image.png)

## 2. Easier Automation and Reproducibility.

When we need to run a lengthy analysis consisting of many steps, it requires a lot of remembering or grit work while automating some functionality in Excel using Macros. You need to perform the steps with actual button clicks, and in general, one needs to re-enter all the formulae in each Excel sheet where they would want to perform any analysis.

In Python + Pandas, we can have such functionality implemented in a single script, and the run the script on many excel files, without having to copy-paste or re-enter any formulae. More time to you, and less effort. So its like we work very hard now to be extremely lazy later :)

Python is also extremely reproducible because all the steps that you take to achieve the final analysis result are already written in code. You don't have to remember the specific order in which you performed those steps the first time you did it because you have already written the code to do it!

This is not entirely possible in Excel, because the entire set of steps we take may sometimes be graphical, and get lost in the large multitude of procedures that we need to follow sometimes.

## 3. Enables easy working with huge datasets

Excel and any other spreadsheet program for that matter have an upper limit on the amount of data it is able to handle with ease. It also stores the entire dataset in the memory and as a result, if you have a less powerful system, your computer starts to lag.

Python (+ other modules) enables this to happen quite easily without the need for actually storing the data in the memory. When we say `pandas.read_csv`, it gives you the option to read it in fixed chunk-sizes which your computer can handle easily. As a result, your computer runs a lot faster and you are able to do the analysis on a comparably less powerful machine without needing to worry about anything.

## 4. Easier to find and fix errors
When you’ve made an error in Excel, figuring out what’s gone wrong can be difficult, since you might have to scroll through thousands of cells of data to find the answer, or attempt to manually re-trace your steps.

But when you make an error in Python, you’ll get an error message telling you exactly what went wrong. And of course, you can also have comments explaining each line of your code, making it easier to go back and re-check each step looking for mistakes. Typically, programmers also use a system for version control, so if you experience an error you haven’t before, you’ll be able to compare your current code with its previous iteration to get a sense of what’s gone wrong. (This is a useful feature available on GitHub as well.)

## 5. Free and Open Source

Excel is not an open source software, nor is it free. You need to have some form of access to Excel, either by buying Microsoft Office 365, or some other form of it, or if you are lucky, your Workplace or University provides it to you for free. As a result, in case you ever lose access to Excel, you'll have to depend on the Open Source alternatives like LibreOffice Calc, or Google Spreadsheets, or some other thing. However, if you use Python, you have a rock-solid dependable piece of software running which no one can lay claim on, nor can it ever be made a paid product.

Along with this, Python being open-source, if you ever have the need for any specific feature, you can always code it in yourself and contribute to the open-source community so that everyone benefits from your code. This is definitely not feasible with Excel.

Also, Excel runs only on Windows and macOS. If you have ever used any form of Linux, you would know the pain of not having MS Office running on your computers. In such a case, Python is able to provide cross-compatible functionality in not only Windows, macOS and Linux, but also in iOS, Android, and various other platforms. Definitely worth it.

## 6. Machine Learning and Advanced Statistics Capabilites

Excel has very good statistics capabilities, but Python has much much more. And Machine Learning libraries are non-existent in Excel, so if you want to do anything beyond basic regression, you would need to use VBScript code or something like that to write out the entire algorithm by yourself.

In Python, such libraries are available for free, and you won't need to manually write code every time to implement the state-of-the-art algorithms, as they will be available to you from before, while at the same time, you have the liberty to write it if you want to. It's a feature you would never be able to match on Excel.

You also get a lot better plotting software in general, leaps and bounds better than Excel plotting software, with very deep levels of control over details.

# Pandas for Excel Super Users

Let's start from the beginning, one last time:

In [1]:
import pandas as pd

In [None]:
pd.read