# Challenge 3 | Optimize a Code With NumPy

As a computational scientist dealing with large data sets, you'll do a lot of coding, and potentially a lot of waiting. While that might leave you with some extra time to ponder the mysteries of the Universe (or take a nap), when you're waiting for results that could help you answer an important question that you've been mulling over for a while, you don't want to wait any longer than you have to. That's why you go beyond just getting your code to work properly, and strive to optimize it to run as quickly and intelligently as possible.

Something that’s extremely common in computational research in the physical sciences is having to compute different types of averages or other global properties of a system of particles, which requires you “loop” over lists/arrays, compute sums, etc. For example, in astrophysics, if you are simulating the evolution of a star cluster or a galaxy, you might do so by setting up a large number of “particles” (e.g., stars, or clumps of stars) and then evolving them according to the physics of the system. Periodically, you assess how the system is evolving by computing various global properties, such as the center of mass, in this example.

There are faster and slower ways to do this, and slow can refer to how long it takes the scientist to program it, as well as how long it takes the computer to perform the calculations. Simulations like the one we described above typically include very large numbers of particles, up to millions or even billions, so computations that involve looping over all particles can quickly become a bottleneck for your code.

In this challenge, you'll explore this example of computing the center of mass of a system of particles in three dimensions. For a system of n particles, each described by their 3D position coordinates and mass, the center of mass, or COM, of the system can be computed with the following formula:

![title](formula.svg)

Here, $x_{cm}$ represents the x coordinate of the COM, and the other COM coordinates $y_{cm}$ and $z_{cm}$ can be computed with analogous equations.

__A version of this code is in the cell below - your job is to optimize it. __

After analyzing the code provided, follow the instructions below to optimize this code, which we recommend doing in different cells within the same notebook. Be sure to retain the original code for comparison later!

In [2]:
# Challenge Code: Version 1 (Original)
import random

# Set up initial system or particles
x, y, z, m = [], [], [], []
for i in range(1000000): 
    x.append(random.uniform(0,100))
    y.append(random.uniform(0,100))
    z.append(random.uniform(0,100))
    m.append(random.uniform(1,10))

#Calculate the COM of the system (writing as a function makes it easier to time it!)
def f1_calc_com(x, y, z, m): 
    xmsum = 0
    for j in range(len(x)):
        xmsum = xmsum + x[j]*m[j]    
    xcm = xmsum/sum(m) 

    ymsum = 0
    for j in range(len(y)):
        ymsum = ymsum + y[j]*m[j]    
    ycm = ymsum/sum(m) 
    
    zmsum = 0
    for j in range(len(z)):
        zmsum = zmsum + z[j]*m[j]    
    zcm = zmsum/sum(m) 
    
    return xcm, ycm, zcm

In [3]:
%timeit f1_calc_com(x, y, z, m)

287 ms ± 5.01 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## Version 1: Keep the Python Lists

Modify the original code to make it run more efficiently in the cell below, but without changing the Python lists to NumPy arrays.

__Tips:__

> - Consider whether there's a more efficient way to set up the initial conditions (using list comprehension)<br>
> - See if it's possible to reduce the amount of looping required<br>
> - Look for any unnecessary or repeated calculations that can be removed<br>
> - Check for alternative ways that certain calculations could be done more efficiently

## Version 2: Convert Python Lists to NumPy Arrays

Use what you've learned about the ease (and efficiency) of working with NumPy arrays to re-calculate the center of mass in a more efficient way. For now, you can start with the Python lists already created (for x, y, z and m), and then proceed as follows:

> - Convert these lists of random numbers to NumPy arrays (you can use asarray)<br>
> - Drop the for loops - you can calculate xcm, ycm, and zcm each with just a single line of code!<br>

## Time It!

Time both versions of your new code in the cell below to verify that NumPy arrays are the way to go when optimizing code!