DeSmuME is a Nintendo DS emulator
rogerman GPU: Make the overall functionality of CopyLineExpand() and CopyLineR…
…educe() more complete. Also do some small optimizations to GPUEngineBase::_LineCopy() while I'm at it.

- GPUEngineBase::_LineCopy() optimizations only apply to 2x, 3x, and 4x scaling.
- Add SSE2 version of 3x CopyLineExpand() when using ELEMENTSIZE==1.
- Add SSE2 versions of CopyLineReduce() and add specific 2x/3x/4x versions of CopyLineReduce_*() algorithms.
- CopyLineExpand() now supports vertical scaling in addition to horizontal scaling.
- GPU buffers that were previously only cache-aligned are now page-aligned if appropriate.
Latest commit acb1402 Sep 19, 2018