# Exercise 02_A : Fusion Basics

Use the following equations to create an implemention that utilizes fusion and reuse optimize underlying code. 

`result = A*C + B/D + ((D-C)/B)/(A*C) `

An example implementation is given with all operations done individually; how much faster can you make it?

In [None]:
%%run_matx
auto exec = matx::CUDAExecutor();

matx::index_t size_x = 128;
matx::index_t size_y = 256;

auto A      = matx::make_tensor<float>({size_x, size_y});
auto B      = matx::make_tensor<float>({size_x, size_y});
auto C      = matx::make_tensor<float>({size_x, size_y});
auto D      = matx::make_tensor<float>({size_x, size_y});
auto result = matx::make_tensor<float>({size_x, size_y});

// ---- populate the data ---- //
(A = matx::random<float>(A.Shape(), matx::NORMAL)).run();
(B = matx::random<float>(B.Shape(), matx::NORMAL)).run();
(C = matx::random<float>(C.Shape(), matx::NORMAL)).run();
(D = matx::random<float>(D.Shape(), matx::NORMAL)).run();
(result = matx::zeros({size_x, size_y})).run(exec);
exec.sync();

// ---- all crammed together ---- //
exec.start_timer();
(result = A * C  + B / D + ((D - C) / B) / A * C).run(exec); 
exec.stop_timer();
std::cout <<"One Equation Runtime: " << exec.get_time_ms() << " ms" << std::endl;

// ---- ideal implementation with reuse of operators ---- //
exec.start_timer();
auto term1 = A * C; 
auto term2 = B / D;
auto term3 = (D - C) / B;
auto term4 = term3 / term1;
(result = term1 + term2 + term4).run(exec);
exec.stop_timer();
std::cout <<"Fused Operation Runtime: " << exec.get_time_ms() << " ms" << std::endl;  