# Top Python Packages For R Users
![](https://images.pexels.com/photos/404153/pexels-photo-404153.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1)

## Introduction 

## Top Python packages for R Users

### Data manipulation libraries

R has a rich ecosystem of data manipulation libraries. Be it dplyr, tidyr, or data.table, R users don't have much to complain about in this regard. However, they might consider switching to some Python alternatives for more flexibility, speed and more features. 

R users will find out that we, Python data scientists, totally revere the Pandas library. It is the king of data manipulation in the Python data science stack and is used by millions around the world. It has [over 20 million weekly downloads](https://pypistats.org/packages/pandas), making it one of the most popular Python packages. 

It offers such an extensive suite of functions and classes to work with data that you can barely scratch the surface even after using it for a couple of years. Pandas is also a keystone library in the ecosystem as many other dominant Python libraries are written so that their functionality aligns with Pandas' classes. 

Even though it is such a large library, it is dead simple to learn and master. By knowing a few classes and functions, you can perform complex analyses on any dataset. 

If Pandas sounds too intimidating to R lovers, they might feel right at home by using the Python datatable package. It was inspired by its R namesake and written solely to deal with the massive datasets of today. It has the ability to read and manipulate gigabyte-sized datasets in mere seconds. 

A common use-case is to read a large dataset with datatable and convert it to Pandas DataFrame format which is much faster than reading it purely with Pandas (Pandas is only fast for small-to-medium sized datasets). But, as an R user, you don't even have to do that as datatable has almost the same syntax as the data.table package of R. 

As far as I know, GPU support isn't fantastic in R, so R users might finally want to try out some GPU power by using Python libraries. RapidsAI offers just that opportunity via the cuDF library. cuDF is a dataframe library to manipulate datasets with billions of rows by tapping into the computing power of NVIDIA GPUs. Another advantage of cuDF is that it has very similar syntax as Pandas. 

### Data visualization libraries

I know ggplot2 is a dataviz legend. It is probably the most loved and used library that ever graced data science. However, and I know R lovers are going to heartily disagree with me on this, the same can probably be said about Matplotlib. 

Matplotlib is one of the first libraries people are introduced to when they start learning data science in Python. It is one of the rare libraries that manage to keep complexity and flexibility in the perfect balance. In other words, it is easy enough to learn and create great charts for beginners while also having all the tools to create [truly amazing custom plots](https://ibexorigin.medium.com/yes-these-unbelievable-masterpieces-are-created-with-matplotlib-b62e0ff2d1a8) for experienced folks. 

Just looking at its download stats tells a lot about its widespread adoption in the community:
![](images/1.png)

If Matplotlib sounds a bit much or you don't like its default styles (no matter, Pythonistas aren't fan of them either), you can check out Seaborn. It is a wrapper API around Matplotlib that makes it considerably easier to use for beginners. More importantly, it places great emphasis on making the plots as pretty as possible without tweaking them much. Seaborn also introduces new plot types and subplotting tools that aren't easily available in Matplotlib. 

If you are sick of old-fashioned static plots, then you can try a family of interactive data visualization libraries of Python. The head of this family is, of course, Plotly, which has deep roots in R as well. It is great to produce high-quality charts out of the box and provides interfaces to customize and create complex charts. If you are unsure to choose between Matplotlib and Plotly, here is a [detailed comparison article](https://towardsdatascience.com/matplotlib-vs-plotly-lets-decide-once-and-for-all-dc3eca9aa011?gi=a7131eddc342). Altair and Bokeh both deserve a mention here as they have their own fanbase because of their unique look and syntax. 

### Math and statistical libraries

It is true that native Python does not come loaded with a host of statistical functions like R does. But its libraries more than make up for this shortcoming. 

The first one is the mighty NumPy, which literally carries many other key Python packages on its shoulders. It is a superb array manipulation library with a rich selection of vectorized math functions that you won't easily find in R packages. Its speed in matrix manipulation is perhaps only rivaled by Julia, which is one of the fastest languages in programming history. 

NumPy's n-dimensional arrays are the backbone of other computational libraries like TensorFlow or PyTorch. For this reason, NumPy's download stats are much higher than even Pandas and Matplotlib's put together:

![](images/2.png)

If you can't find a function in NumPy, then SciPy is the answer. It has got separate sub-modules for a range of computational problems in math, physics and statistics. 

Its `special` functions module contains key mathematical physics functions for researches that are all written for speed. You can solve optimization problems using its `optimization` module while `integrate` and `fft` modules take care of calculus and Fourier transforms. 

Its `linalg` modules contains everything in NumPy's `linalg` module plus more advanced and niche linear algebra functions. This module has great support from BLAS/LAPACK (standard base software libraries for linear algebra) making it even faster than NumPy. 

Oh and did I mention multi-dimensional image processing? While NumPy is great for 2-D/3-D images, it can't easily handle higher order images from medicine and biology. This is where you can use SciPy's `ndimage` module.

Up until now, we have focused more on math than statistics. For this reason, let me introduce you to statsmodels. It is a vast library with functions and classes to estimate many statistical models, to conduct hypothesis tests and to explore data. 

There are entire sets of functions for regression analysis as well as mature APIs for generalized linear models. I especially love its Time Series analysis module as it contains specialized functions to perform and visualize time series. 

Its other models can be used for survival/duration analysis, nonparametric methods and multivariate statistics. And to R users' delight, most of these mentioned modules use R-like syntax in both writing functions and printing their output. All in all, statsmodels is the perfect combination of NumPy, SciPy and Matplotlib for statistical analysis.

Another wonderful math library is SymPy (Symbolic Python), which is perhaps unique in all of programming as it provides an API for symbolic mathematics. 

### Machine learning libraries

### Deep learning libraries

## Beyond the language wars: R and Python for the Modern Data Scientist